# Fine-Tuning

Reusing already pre-trained Neural Network

In [1]:
%matplotlib inline
import os
import torch
import torchvision
import torchvision.transforms as transforms
from torch import nn
from d2l import torch as d2l

We may not always have enough data to train/computing resource from scratch a large model

One solution, aside from the obvious costly and time consuming data collection step, is to re-use part of state of the art neural network train on large scale dataset (usually ImageNet)  
The idea is to reuse the feature learned on a very general dataset and *fine-tune* them for a problem specific dataset

This technique is called *transfer learning*, we transfer the knowledge learned from the source dataset to the target dataset  
The usual way to do it, is by copying all except the output layer (i.e., the layer that classify things based on the feature extracted)

<center>
    <img src='images/finetune.svg' />
    <p>Source: <a href='http://d2l.ai'>d2l.ai</a></p>
</center>

When applying fine-tuning, we update the weights of the pre-trained neural network to better fit the new dataset  

Another technique, called *feature extraction* block the weights of the pre-trained layer, only training the output layer. If your datasets are very similar, you might get good result from this with a low training cost   

However, in practice it is often common the first the first layer and only train the last layers of the pre-trained neural network. A good intuition behind this, is that the first convolution layer learn very generic features while deeper ones learns more class specific features

Pytorch contains already pre-trained state of the art neural networks for you

In [2]:
pretrained_net = torchvision.models.resnet18(pretrained=True)

In [3]:
pretrained_net.fc

Linear(in_features=512, out_features=1000, bias=True)

In [4]:
'''
If you only want to train the head of the network
for param in pretrained_net.parameters():
    param.requires_grad = False
'''

'\nIf you only want to train the head of the network\nfor param in pretrained_net.parameters():\n    param.requires_grad = False\n'

ImageNet-1000 contains 1000 classes

In [5]:
bs = 32

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])


trainset = torchvision.datasets.CIFAR10(root='data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=bs, shuffle=True, num_workers=8)

testset = torchvision.datasets.CIFAR10(root='data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=bs*2, shuffle=False, num_workers=8)

Files already downloaded and verified
Files already downloaded and verified


We then simply swap the head with a `Linear` layer of the correct output shape that we initialize

In [6]:
pretrained_net.fc = nn.Linear(pretrained_net.fc.in_features, 10)
nn.init.xavier_normal_(pretrained_net.fc.weight)
nn.init.constant_(pretrained_net.fc.bias, 0);

If we have a GPU, we use it  
`DataParallel` allows to use multiple GPU in parallel if we have them

In [7]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
pretrained_net = pretrained_net.to(device)
if device == 'cuda':
    pretrained_net = torch.nn.DataParallel(net) # if multiple GPUs use them
criterion = nn.CrossEntropyLoss(label_smoothing=0.1)
optimizer = torch.optim.AdamW(pretrained_net.parameters(), lr=1e-4, weight_decay=0.0001)
scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'min', patience=3, verbose=True, min_lr=1e-5, factor=0.5)

In [8]:
import wandb
wandb.init(project="pretraining-resnet18")
from tqdm.notebook import trange, tqdm

epochs = 10
for epoch in trange(epochs):
    accurate = 0
    total = 0
    losses = 0
    for X, y in tqdm(trainloader):
        y_pred = pretrained_net(X)
        loss = criterion(y_pred, y)
        score, predicted = torch.max(y_pred, 1)
        accurate += (y == predicted).sum().float()
        losses += loss.item()
        total += len(y)

        # zero the gradients before running
        # the backward pass.
        optimizer.zero_grad()

        # Backward pass to compute the gradient
        # of loss w.r.t our learnable params. 
        loss.backward()

        # Update params
        optimizer.step()
    
    wandb.log({
            'loss': losses / len(trainloader),
            'accuracy': accurate / total
    })
    

[34m[1mwandb[0m: Currently logged in as: [33mingambe[0m (use `wandb login --relogin` to force relogin)
[34m[1mwandb[0m: wandb version 0.12.7 is available!  To upgrade, please run:
[34m[1mwandb[0m:  $ pip install wandb --upgrade


HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=10.0), HTML(value='')))

HBox(children=(HTML(value=''), FloatProgress(value=0.0, max=1563.0), HTML(value='')))





KeyboardInterrupt: 