<a href="https://colab.research.google.com/github/yandexdataschool/MLatImperial2021/blob/master/06_lab/finetuning_solution.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
from google.colab import drive
drive.mount('/content/gdrive')

In [None]:
!mkdir /content/.kaggle
!cp /content/gdrive/My\ Drive/kaggle.json /content/.kaggle/
!chmod 600 /content/.kaggle/kaggle.json
!ls -l /content/.kaggle

%env KAGGLE_CONFIG_DIR=/content/.kaggle

Go to https://www.kaggle.com/c/dogs-vs-cats and accept the rules to be able to get the data.

In [None]:
!kaggle competitions download --force -c dogs-vs-cats
!unzip train.zip

### Using pre-trained model

Today we're going to build and fine-tune CNN based on weights pre-trained on ImageNet: the largest image classification dataset as of now.
More about imagenet: http://image-net.org/
Setup: classify from a set of 1000 classes.

In [None]:
import scipy as sp
import scipy.misc
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

In [None]:
import requests

# class labels
LABELS_URL = 'https://s3.amazonaws.com/outcome-blog/imagenet/labels.json'
labels = {int(key):value for (key, value) in requests.get(LABELS_URL).json().items()}

In [None]:
print(list(labels.items())[:5])

### TorchVision
PyTorch has several companion libraries, one of them being [torchvision](https://github.com/pytorch/vision/tree/master/) - it contains a number of popular vision datasets, preprocessing tools and most importantly, [pre-trained models](https://github.com/pytorch/vision/tree/master/torchvision/models).

For now, we're going to use torch Inception-v3 module.

We're gonna use the inception-v3 network:
![img](https://hackathonprojects.files.wordpress.com/2016/09/googlenet_diagram.png?w=650&h=192)

Let's first look at the code here: [url](https://github.com/pytorch/vision/blob/master/torchvision/models/inception.py)

In [None]:
from torchvision.models.inception import inception_v3

model = inception_v3(pretrained=True,      # load existing weights
                     transform_input=False, # preprocess input image the same way as in training
                     progress=True)

model.aux_logits = False # don't predict intermediate logits (yellow layers at the bottom)
model.eval()

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
from skimage.transform import resize

### Predict class probabilities

In [None]:
!wget https://github.com/yandexdataschool/MLatImperial2021/raw/master/06_lab/albatross.jpg

In [None]:
img = resize(plt.imread('albatross.jpg'), (299, 299))
plt.axis("off")
plt.imshow(img)
plt.show()

img = torch.FloatTensor(img.reshape([1, 299, 299, 3]).transpose([0,3,1,2]))

probs = torch.nn.functional.softmax(model(img), dim=-1)

probs = probs.data.numpy()

top_ix = probs.ravel().argsort()[-1:-10:-1]
print ('top-10 classes are: \n [prob : class label]')
for l in top_ix:
    print ('%.4f :\t%s' % (probs.ravel()[l], labels[l].split(',')[0]))



### Having fun with pre-trained nets

In [None]:
!wget http://cdn.com.do/wp-content/uploads/2017/02/Donal-Trum-Derogar.jpeg -O img.jpg

In [None]:
img = resize(plt.imread('img.jpg')[:-100,200:-150], (299,299))
plt.imshow(img)
plt.axis("off")
plt.show()

img = torch.FloatTensor(img.reshape([1, 299, 299, 3]).transpose([0,3,1,2]))

probs = torch.nn.functional.softmax(model(img), dim=-1)

probs = probs.data.numpy()

top_ix = probs.ravel().argsort()[-1:-10:-1]
print ('top-10 classes are: \n [prob : class label]')
for l in top_ix:
    print ('%.4f :\t%s' % (probs.ravel()[l], labels[l].split(',')[0]))



# Grand-quest: Dogs Vs Cats
* original competition
* https://www.kaggle.com/c/dogs-vs-cats
* 25k JPEG images of various size, 2 classes (guess what)

### Your main objective
* In this seminar your goal is to fine-tune a pre-trained model to distinguish between the two rivaling animals
* The first step is to just reuse some network layer as features

### As before, we will use auxilary function you have seen on Monday

In [None]:
from IPython.display import clear_output
from sklearn.metrics import accuracy_score

class Logger:
  def __init__(self):
    self.train_loss_batch = []
    self.train_loss_epoch = []
    self.test_loss_batch = []
    self.test_loss_epoch = []
    self.train_batches_per_epoch = 0
    self.test_batches_per_epoch = 0
    self.epoch_counter = 0
    
    self.accuracy = []

  def fill_train(self, loss):
    self.train_loss_batch.append(loss)
    self.train_batches_per_epoch += 1

  def fill_test(self, loss):
    self.test_loss_batch.append(loss)
    self.test_batches_per_epoch += 1
    
  def fill_accuracy(self, y_true, y_pred):    
    self.accuracy.append(accuracy_score(y_true, y_pred))

  def finish_epoch(self):
    self.train_loss_epoch.append(np.mean(
        self.train_loss_batch[-self.train_batches_per_epoch:]
    ))
    self.test_loss_epoch.append(np.mean(
        self.test_loss_batch[-self.test_batches_per_epoch:]
    ))
    self.train_batches_per_epoch = 0
    self.test_batches_per_epoch = 0
    
    clear_output()
  
    print("epoch #{} \t train_loss: {:.8} \t test_loss: {:.8} \t test_acc: {:.8}".format(
              self.epoch_counter,
              self.train_loss_epoch[-1],
              self.test_loss_epoch [-1],
              self.accuracy[-1]
          ))
    
    self.epoch_counter += 1

    plt.figure(figsize=(18, 5))

    plt.subplot(1, 3, 1)
    plt.plot(self.train_loss_batch, label='train loss')
    plt.xlabel('# batch iteration')
    plt.ylabel('loss')
    plt.legend()

    plt.subplot(1, 3, 2)
    plt.plot(self.train_loss_epoch, label='average train loss')
    plt.plot(self.test_loss_epoch , label='average test loss' )
    plt.legend()
    plt.xlabel('# epoch')
    plt.ylabel('loss')
    
    plt.subplot(1, 3, 3)
    plt.plot(self.accuracy, label='test acc')
    plt.xlabel('# epoch')
    plt.ylabel('acc')
    plt.legend()    
    
    plt.show();
    
    
def preprocess_data(X, y):
  X_preprocessed = torch.tensor(X)
  y_preprocessed = torch.tensor(y)
  return X_preprocessed, y_preprocessed


def get_batches(X, y, batch_size, shuffle=False):
  if shuffle:
    shuffle_ids = np.random.permutation(len(X))
    X = X[shuffle_ids].copy()
    y = y[shuffle_ids].copy()
  for i_picture in range(0, len(X), batch_size):
    # Get batch and preprocess it:
    batch_X = X[i_picture:i_picture + batch_size]
    batch_y = y[i_picture:i_picture + batch_size]
    
    # 'return' the batch (see the link above to
    # better understand what 'yield' does)
    yield preprocess_data(batch_X, batch_y)    

We also introduce new functions, they are very convinient in PyTorch, when you need to work with data, that does not fit in memory but can be easily downloaded in batches, for example, images

In [None]:
from torch.utils.data import Dataset 
from PIL import Image
from torchvision import transforms
import os

In [None]:
class MyDataset(Dataset):
    """
    This class inherits from pytorch dataset.
    It defines, how the data will be downloaded and preprocessed.
    """
    
    def __init__(self, data_paths, transform_X=None):
        self.data_paths = data_paths
        self.transform_X = transform_X
    
    def __getitem__(self, index):
        x = Image.open(self.data_paths[index])
        if self.transform_X:
            x = self.transform_X(x)
        y = "cat" in self.data_paths[index]
        return x, np.float32(y)

    def __len__(self):
        return len(self.data_paths)

In [None]:
# Define path to folder with images
train_paths = ["./train/" + name for name in os.listdir("train/")]

# Here I split val/train half and half
val_paths = train_paths[:12500]
train_paths= train_paths[12500:]

len(val_paths), len(train_paths), np.sum(["cat" in path for path in val_paths]),\
                                  np.sum(["cat" in path for path in train_paths])

Since we are going to use pretrained model we need **TO MAKE SURE** that we preprocess the data in the same way, it was done during training.

In this case, we need to

- Resize the image
- Normalise it

In [None]:
means = np.array((0.485, 0.456, 0.406))
stds = np.array((0.229, 0.224, 0.225))

transform_X = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.ToTensor(),
    transforms.Normalize(means, stds),
])


subset_of_train = 5000

train_dataset = MyDataset(train_paths[:subset_of_train], transform_X=transform_X)

train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=256,
                                              shuffle=True,
                                              num_workers=10)

transform_test = transforms.Compose([
    transforms.Resize((299, 299)),
    transforms.ToTensor(),
    transforms.Normalize(means, stds),
])

subset_of_val = 1000

val_dataset = MyDataset(val_paths[:subset_of_val], transform_X=transform_test)

val_batch_gen = torch.utils.data.DataLoader(val_dataset, 
                                            batch_size=256,
                                            shuffle=False,
                                            num_workers=10)

# If you do not understand what is going on, run the loop below and see the output shapes
# for (x_batch, y_batch) in train_batch_gen:

#     print('X:', type(x_batch), x_batch.shape)
#     print('y:', type(y_batch), y_batch.shape)
#     break

# Task 1. Use standard sklearn to train

So now, we will use loaded above Inception model and get its output. Since we do not want to have classifcation as in ImageNet, we substitute the last layer with identity.

In [None]:
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
device

In [None]:
# create layer that returns unchanged input
class Identity(torch.nn.Module):

    def __init__(self):
        super(Identity, self).__init__()

    def forward(self, x):
        return x

In [None]:
from tqdm import tqdm

model.eval()
model.fc = Identity()
model.to(device)
new_X_train, new_y_train = [], []
for (X_batch, y_batch) in tqdm(train_batch_gen):
    with torch.no_grad():
        new_X_train.extend(model(X_batch.to(device)).detach().cpu().numpy())
        new_y_train.extend(y_batch.detach().cpu().numpy())

new_X_train = np.array(new_X_train)
new_y_train = np.array(new_y_train)        

In [None]:
new_X_test, new_y_test = [], []
for (X_batch, y_batch) in tqdm(val_batch_gen):
    with torch.no_grad():
        new_X_test.extend(model(X_batch.to(device)).detach().cpu().numpy())
        new_y_test.extend(y_batch.detach().cpu().numpy())
        
new_X_test = np.array(new_X_test)
new_y_test = np.array(new_y_test)

In [None]:

from sklearn.linear_model import LogisticRegression

logreg = LogisticRegression(solver='liblinear')
logreg.fit(new_X_train, new_y_train)

print((logreg.predict(new_X_test) == new_y_test).mean())

# Task 2. Use just predicted outputs of pretrained net as input for NN 

To explore the power of this technique, let just take just one batch - 256 images as training data.

In [None]:
subset_of_train = 256
train_dataset = MyDataset(train_paths[:subset_of_train], transform_X=transform_X)
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=256,
                                              shuffle=True,
                                              num_workers=10)

model.eval()
model.fc = Identity()
model.to(device)
new_X_train, new_y_train = [], []
for (X_batch, y_batch) in tqdm(train_batch_gen):
    with torch.no_grad():
        new_X_train.extend(model(X_batch.to(device)).detach().cpu().numpy())
        new_y_train.extend(y_batch.detach().cpu().numpy())

new_X_train = np.array(new_X_train)
new_y_train = np.array(new_y_train)

Now we define our new NN head

In [None]:
class NetHead(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.head = nn.Sequential(
            torch.nn.Linear(2048, 16),
            torch.nn.ELU(),
            torch.nn.Linear(16, 1)
        )
        
    def forward(self, input):
        return self.head(input)

tn = NetHead().to(device)

And train it as before we did before

In [None]:
loss_function = torch.nn.BCEWithLogitsLoss()
learning_rate = 0.001
optimizer = torch.optim.Adam(tn.parameters(), lr=learning_rate)

In [None]:
logger = Logger()

In [None]:
n_epochs = 20

for i_epoch in range(n_epochs):
    tn.train()
    for (X_batch, y_batch) in get_batches(new_X_train, new_y_train, batch_size=128):
        
        loss = loss_function(tn(X_batch.to(device)), y_batch.view(-1,1).to(device))

        tn.zero_grad()
        loss.backward()
        optimizer.step()

        logger.fill_train(loss.item())
  
    tn.eval()
    y_true = []
    y_pred = []
    for (X_batch, y_batch) in get_batches(new_X_test, new_y_test, batch_size=256):
        with torch.no_grad():
            y_net = tn(X_batch.to(device))
            loss = loss_function(y_net, y_batch.view(-1,1).cuda(0))
            y_pred.extend(y_net.detach().cpu().numpy())
            y_true.extend(y_batch.view(-1,1).detach().cpu().numpy())
            logger.fill_test(loss.item())
    logger.fill_accuracy(np.array(y_true), np.array(y_pred) > 0.)
    logger.finish_epoch()

Impressive right?

# Task 3. Use pretrained net to define new model

In [None]:
class TransferredNet(nn.Module):
    def __init__(self, pretrained_model):
        super().__init__()
        self.pretrained_model = pretrained_model
        self.pretrained_model.fc = Identity()
        
        self.head = nn.Sequential(
            torch.nn.Linear(2048, 16),
            torch.nn.ELU(),
            torch.nn.Linear(16, 1)
        )
        
    def forward(self, input):
        return self.head(self.pretrained_model(input))

tn = TransferredNet(model).to(device)

for param in tn.pretrained_model.parameters():
    param.requires_grad = False

In [None]:
loss_function = torch.nn.BCEWithLogitsLoss()
learning_rate = 0.001

# Look here we only provide optimiser parameters, we really want to optimise (ie propogate gradient)
optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, tn.parameters()), lr=learning_rate)

In [None]:
logger = Logger()

In [None]:
n_epochs = 10

for i_epoch in range(n_epochs):
    tn.train()
    tn.pretrained_model.eval()
    print("Train:" + "-"*30 + "->\n")
    for (X_batch, y_batch) in train_batch_gen:
        
        loss = loss_function(tn(X_batch.to(device)), y_batch.view(-1,1).to(device))

        tn.zero_grad()
        loss.backward()
        optimizer.step()

        logger.fill_train(loss.item())
  
    tn.eval()
    y_true = []
    y_pred = []
    print("Eval:" + "-"*30 + "->\n")
    for (X_batch, y_batch) in val_batch_gen:
        with torch.no_grad():
            y_net = tn(X_batch.to(device))
            loss = loss_function(y_net, y_batch.view(-1,1).to(device))
            y_pred.extend(y_net.detach().cpu().numpy())
            y_true.extend(y_batch.view(-1,1).detach().cpu().numpy())
            logger.fill_test(loss.item())
    logger.fill_accuracy(np.array(y_true), np.array(y_pred) > 0.)
    logger.finish_epoch()

Before next step, it s a good idea to save weights, since the procedure is very unstable.

In [None]:
torch.save(tn, "trained_head.pt")

OK, now, to get even better result, one can finetune the body network as well.
This procedure is unstable and requite very small learning rate and simple optimisation algo.
Also, since the body is huge, we can only work with small batch size to fit in GPU.

In [None]:
train_batch_gen = torch.utils.data.DataLoader(train_dataset, 
                                              batch_size=32,
                                              shuffle=True,
                                              num_workers=10)


for param in tn.pretrained_model.parameters():
    param.requires_grad = True


loss_function = torch.nn.BCEWithLogitsLoss()
learning_rate = 0.001

optimizer = torch.optim.SGD(tn.parameters(), lr=learning_rate)


In [None]:
n_epochs = 10

for i_epoch in range(n_epochs):
    tn.train()
    tn.pretrained_model.eval()
    print("Train:" + "-"*30 + "->\n")
    for (X_batch, y_batch) in train_batch_gen:
        
        loss = loss_function(tn(X_batch.to(device)), y_batch.view(-1,1).to(device))

        tn.zero_grad()
        loss.backward()
        optimizer.step()

        logger.fill_train(loss.item())
  
    tn.eval()
    y_true = []
    y_pred = []
    print("Eval:" + "-"*30 + "->\n")
    for (X_batch, y_batch) in val_batch_gen:
        with torch.no_grad():
            y_net = tn(X_batch.to(device))
            loss = loss_function(y_net, y_batch.view(-1,1).to(device))
            y_pred.extend(y_net.detach().cpu().numpy())
            y_true.extend(y_batch.view(-1,1).detach().cpu().numpy())
            logger.fill_test(loss.item())
    logger.fill_accuracy(np.array(y_true), np.array(y_pred) > 0.)
    logger.finish_epoch()