# Foundations Of AIML
## Session 11
### Experiment 2.1: Fine-tuning pre-trained CNN

We have seen using the pre-trained model as a black box for feature extraction. This gave us a decent accuracy. However, if we have sufficent data we can *tweak* the learned model to extract features specific to our new dataset. Note that, we have 5000 training images which is not sufficient to train a deep model from scratch. But, 5000 might be enough to *tweak* the pre-trained model to be specific to our dataset. We will see what happens when we tweak only a small specific part of the pre-trained model. We will also see how to tweak the entire model. How many layers to tweak depends on amount of available data. Finetuning to specific data, when done properly, is almost always beneficial.

In [0]:
# http://pytorch.org/
from os import path
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())

accelerator = 'cu80' if path.exists('/opt/bin/nvidia-smi') else 'cpu'

!pip install -q http://download.pytorch.org/whl/{accelerator}/torch-0.3.0.post4-{platform}-linux_x86_64.whl torchvision
import torch

In [2]:
%

[0m[01;34mdatalab[0m/


In [0]:
# The below are wrapper functions used to connect to your drive and this needs to be run once (i.e. once every new session or possibily refreshes for every 24 hours)

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse

# Authentication for your google drive
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()

# Authentication for the wrapper libraries  or possibily refreshes for every 24 hours)

!apt-get install -y -qq software-properties-common python-software-properties module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse


import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}

# Create a directory and mount Google Drive using that directory.
!mkdir -p MyDrive
!google-drive-ocamlfuse MyDrive

%cd MyDrive/Session-11-Experiments

gpg: keybox '/tmp/tmpjybi_173/pubring.gpg' created
gpg: /tmp/tmpjybi_173/trustdb.gpg: trustdb created
gpg: key AD5F235DF639B041: public key "Launchpad PPA for Alessandro Strada" imported
gpg: Total number processed: 1
gpg:               imported: 1
Please, open the following URL in a web browser: https://accounts.google.com/o/oauth2/auth?client_id=32555940559.apps.googleusercontent.com&redirect_uri=urn%3Aietf%3Awg%3Aoauth%3A2.0%3Aoob&scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive&response_type=code&access_type=offline&approval_prompt=force


In [0]:
# Importing pytorch packages
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import torch.backends.cudnn as cudnn
import torchvision
import torchvision.transforms as transforms
from torch.autograd import Variable
# Importing config.py file
import config as cf
from utils import *
from light_cnn import network_9layers
from data_loader import *
## Importing python packages
import cv2
import os
import sys
import time
import datetime
import numpy as np
import math
import matplotlib.pyplot as plt

In [21]:
img_root = cf.data_dir+'IMFDB_final/'

train_list_file = cf.data_dir+'IMFDB_train.txt'   #### 5000 images for training
val_list_file = cf.data_dir+'IMFDB_test.txt'      #### 1095 images for validation


train_image_list = [line.rstrip('\n') for line in open(train_list_file)]
val_image_list = [line.rstrip('\n') for line in open(val_list_file)]

print(len(train_image_list), len(val_image_list))

trainloader = torch.utils.data.DataLoader(custom_data_loader(img_root = img_root, image_list = train_list_file, crop=False,
                                                             resize = True, resize_shape=[128,128]), 
                                           batch_size=32, num_workers=16, shuffle = True, pin_memory=False)

testloader = torch.utils.data.DataLoader(custom_data_loader(img_root = img_root, image_list = val_list_file, crop=False, mirror=False, 
                                                           resize = True, resize_shape=[128,128]), 
                                           batch_size=10, num_workers=5, shuffle = False, pin_memory=False)


classes = ['AamairKhan', 'Rimisen', 'Kajol', 'KareenaKapoor','RishiKapoor', 'AmrishPuri', 'AnilKapoor', 'AnupamKher', 'BomanIrani', 'HrithikRoshan', 'KajalAgarwal', 'KatrinaKaif', 'Madhavan', 'MadhuriDixit', 'Umashri', 'Trisha']

(5000, 1095)


In [0]:
# Checking for GPU instance
use_cuda = torch.cuda.is_available()
#Intilizaing the accuracy value as zero
best_acc = 0
num_classes = 16

### Net surgery
the original pre-trained model has the last layer (fc2) for 79077 classes but we want to have last layer for only 16 classes.
We chop-off the fc2 with 79077 classes and *implant* a new classifier (the MLP model we used in the previous experiment) which predicts 16 classes. Note that we could also implant a single FC layer with 16 classes (instead of a 3 layer MLP).

In [0]:
feature_net = network_9layers()   ### creates an object of this network architecture
feature_net = torch.load(cf.data_dir+'light_cnn/light_cnn_ckpt.t7')['net']

### starting the surgery
layers_to_remove = ['fc2']
for layers_ in layers_to_remove:        
    del(feature_net._modules[layers_])
    
#### old fc2 removed.

classifier = nn.Sequential(nn.Linear(256, 64), nn.BatchNorm1d(64), nn.ReLU(),
                           nn.Linear(64, 32), nn.BatchNorm1d(32), nn.ReLU(),
                           nn.Linear(32, num_classes))

### implanting a new fc2
feature_net.fc2 = classifier
if use_cuda:
    feature_net.cuda()

In [0]:
### Intiliazing the loss
criterion = nn.CrossEntropyLoss()

In [0]:
def train(epoch):
    print('\nEpoch: %d' % epoch)
    feature_net.train()
    train_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(trainloader):
        if use_cuda:
            inputs, targets = inputs.cuda(), targets.cuda()
        optimizer.zero_grad()
        inputs, targets = Variable(inputs), Variable(targets)
        outputs = feature_net(inputs)      ### notice that the pre-trained network has an implant classifier which directly outputs the 16 class prediction scores

        
        size_ = outputs.size()
        outputs_ = outputs.view(size_[0], num_classes)
        loss = criterion(outputs_, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.data[0]
        _, predicted = torch.max(outputs_.data, 1)
        total += targets.size(0)
        correct += predicted.eq(targets.data).cpu().sum()
        
        progress_bar(batch_idx, len(trainloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                         % (train_loss/(batch_idx+1), 100.*correct/total, correct, total))
        
    train_loss_file.write('%d %.3f %.3f\n' %(epoch, train_loss/len(trainloader), 100.*correct/total))

In [0]:
def test(epoch):
    global best_acc
    feature_net.eval()
    test_loss = 0
    correct = 0
    total = 0
    for batch_idx, (inputs, targets) in enumerate(testloader):
        if use_cuda:
            inputs, targets = inputs.cuda(), targets.cuda()
        inputs, targets = Variable(inputs, volatile=True), Variable(targets)
        outputs = feature_net(inputs)
        size_ = outputs.size()
        outputs_ = outputs.view(size_[0], num_classes)
        loss = criterion(outputs_, targets)

        test_loss += loss.data[0]
        _, predicted = torch.max(outputs_.data, 1)
        total += targets.size(0)
        correct += predicted.eq(targets.data).cpu().sum()
        
        progress_bar(batch_idx, len(testloader), 'Loss: %.3f | Acc: %.3f%% (%d/%d)'
                         % (test_loss/(batch_idx+1), 100.*correct/total, correct, total))
        
    val_loss_file.write('%d %.3f %.3f\n' %(epoch,  test_loss/len(testloader), 100.*correct/total))

    # Save checkpoint.
    acc = 100.*correct/total
    if acc > best_acc:
        print('Saving..')
        state = {
            'net': classifier,
            'acc': acc,
            'epoch': epoch,
        }
        if not os.path.isdir(cf.data_dir+'checkpoint'):
            os.mkdir(cf.data_dir+'checkpoint')
        torch.save(state, cf.data_dir+'checkpoint/checkpoint_ckpt.t7')
        best_acc = acc
    
    return test_loss/len(testloader)

In [0]:
experiment = 'lightnet_finetune_fc1_fc2_IMFDB'
train_loss_file = open(cf.data_dir+experiment+"train_loss.txt", "w", 0)
val_loss_file = open(cf.data_dir+experiment+"val_loss.txt", "w", 0)

In [0]:
### tweak only selected parts : FC1 and FC2. FC2 is 3 layer MLP.

layers_to_finetune = [{'params': feature_net.fc1.parameters()},
                      {'params':feature_net.fc2.parameters()}]

optimizer = optim.Adam(layers_to_finetune, lr=0.001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, patience=2, verbose=True)   #### dynamic LR scheduler
for epoch in range(0, 30):
    train(epoch)
    test_loss = test(epoch)
    scheduler.step(test_loss)
    
train_loss_file.close()
val_loss_file.close()


Epoch: 0
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
image not none resizing
ERROR: couldn't find image ->  isfile data/IMFDB_final/KajalAgarwal/Misc/images/KajolAggarwal_133.jpg
ERROR: couldn't find image -> image_is_none data/IMFDB_final/KajalAgarwal/Misc/images/KajolAggarwal_133.jpg
creating a default image
image not none resizing
image not n

In [1]:
training_curves(cf.data_dir+experiment)

NameError: ignored

In [0]:
### After training we load the model that performed the best on validation data (avoid picking overfitted model)
classifier = torch.load(cf.data_dir+'checkpoint/checkpoint_ckpt.t7')['net'].eval()

In [0]:
def eval():
    feature_net.eval()
    
    testloader = torch.utils.data.DataLoader(custom_data_loader(img_root = img_root, image_list = val_list_file, crop=False, mirror=False, 
                                                           resize = True, resize_shape=[128,128]), 
                                           batch_size=1, num_workers=1, shuffle = False, pin_memory=True)
    correct = 0
    total = 0
    conf_mat = np.zeros((num_classes, num_classes))
    total_ = np.zeros((num_classes))
    wrong_predictions = []
    for batch_idx, (inputs, targets) in enumerate(testloader):
        if use_cuda:
            inputs, targets = inputs.cuda(), targets.cuda()
        inputs, targets = Variable(inputs, volatile=True), Variable(targets)
        outputs = feature_net(inputs)
        size_ = outputs.size()
        outputs_ = outputs.view(size_[0], num_classes)
        _, predicted = torch.max(outputs_.data, 1)
        total += targets.size(0)
        correct += predicted.eq(targets.data).cpu().sum()
        prediction = predicted.cpu().numpy()[0]
        targets = targets.data.cpu().numpy()[0]
        total_[targets] +=1
        conf_mat[predicted, targets] +=1
        
        if prediction != targets:
            wrong_predictions += [[inputs, prediction, targets]]
        
    for k in range(num_classes):
        conf_mat[:,k] /= total_[k]
    return conf_mat, 100.*correct/total, wrong_predictions
    

In [0]:
conf, acc, wrong_predictions = eval()
print('Accuracy:', acc, '%')

Whoa!! :o Fine-tuning improved the accuracy by more than 15%

In [0]:
plt.imshow(conf, cmap='jet', vmin=0, vmax = 1)
plt.show()

In [0]:
for w in wrong_predictions[::10]:
    print classes[w[2]], 'confused with', classes[w[1]]
    plt.imshow(w[0][0][0].data.cpu().numpy(), cmap='gray')
    plt.show()

### Now let's try fine-tuning all layers in the network

In [0]:
feature_net = network_9layers()   ### creates an object of this network architecture
feature_net = torch.load(cf.data_dir+'light_cnn/light_cnn_ckpt.t7')['net']

layers_to_remove = ['fc2']
for layers_ in layers_to_remove:        
    del(feature_net._modules[layers_])
    


classifier = nn.Sequential(nn.Linear(256, 64), nn.BatchNorm1d(64), nn.ReLU(),
                           nn.Linear(64, 32), nn.BatchNorm1d(32), nn.ReLU(),
                           nn.Linear(32, num_classes))

feature_net.fc2 = classifier
if use_cuda:
    feature_net.cuda()

In [0]:
layers_to_finetune = [{'params': feature_net.features.parameters()}, 
                      {'params': feature_net.fc1.parameters()},
                      {'params':feature_net.fc2.parameters()}]

In [0]:
experiment = 'lightnet_finetune_all_IMFDB'
train_loss_file = open(cf.data_dir+experiment+"train_loss.txt", "w", 0)
val_loss_file = open(cf.data_dir+experiment+"val_loss.txt", "w", 0)

In [0]:
best_acc = 0
optimizer = optim.Adam(layers_to_finetune, lr=0.001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', factor=0.5, patience=2, verbose=True)   #### dynamic LR scheduler
for epoch in range(0, 30):
    train(epoch)
    test_loss = test(epoch)
    scheduler.step(test_loss)
    
train_loss_file.close()
val_loss_file.close()

In [0]:
training_curves(cf.data_dir+experiment)

In [0]:
classifier = torch.load(cf.data_dir+'checkpoint/checkpoint_ckpt.t7')['net'].eval()

In [0]:
conf, acc, wrong_predictions = eval()
print('Accuracy:', acc, '%')

In [0]:
plt.imshow(conf, cmap='jet', vmin=0, vmax = 1)
plt.show()

In [0]:
for w in wrong_predictions[::10]:
    print(classes[w[2]], 'confused with', classes[w[1]])
    plt.imshow(w[0][0][0].data.cpu().numpy(), cmap='gray')
    plt.show()

Relying on pre-trained networks we improved the accuracy by more than 20%!!

### Different parts of the model can have different LR. While fine-tuning it is common to set the implanted layers to 10 times higher LR than pre-trained layers. This helps the implanted layers learns faster since it is starting from scratch.