# 06- ResNet

When you have a deep layered network, there is a problem of vanishing gradients. The gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This result on the weights never updating its values and therefore, no learning is being performed.

With ResNets, the gradients can flow directly through the skip connections backwards from later layers to initial filters. A ResNet connection looks like this:

![](../media/resnet/ResNet_connection.png "Deep Residual Learning for Image Recognition")

So it allows the previous layer (**Res**idual inputs) to propagate thtrough the **Net**work, creating a ResNet. Some skip connections can happen between multiple layers, like below.

<img src="../media/intro/SkipConnections.jpg" alt="SkipConnections by analyticsvidhya" style="width: 60%;"/>

## Transfer Learning

Transfer learning is a machine learning method where a model developed for a task is reused as the starting point for a model on a second task.

It is a popular approach in deep learning where pre-trained models are used as the starting point on computer vision and natural language processing tasks given the vast compute and time resources required to develop neural network models on these problems and from the huge jumps in skill that they provide on related problems.

We will use `resnet18`, an image model trained on the ImageNet dataset as a photograph classification with 1000 classes.

We will split the ResNet common layers into a feature extractor, and a (fine-tuned for our dataset) classifier.

<img src="../media/intro/Fine-Tuning.png" alt="Fine-Tuning by geeksforgeeks" style="width: 60%;"/>

In [None]:
import torch
import torch.nn as nn
from matplotlib import pyplot as plt
import numpy as np
import torchvision
import torchvision.models as models
from torchvision import transforms
import torch.optim as optim
import time
import tqdm as tqdm
from torch.autograd import Variable
from torchvision.models import ResNet18_Weights

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using {device} device")

### Hyperparameters

In [None]:
RATIO_VALIDATION = 0.2
BATCH_SIZE = 64
LEARNING_RATE = 0.01
EPOCHS = 10

### Load Dataset

It is important to have a dataset that fits pretrained models. The ResNet implemented in torchvision take an RGB image as inputs, which has three channels. Our dataset has one B/W channel. 

So, here we repeat the single-channel grey scale digits image three times to fit the torchvision model.

In [None]:
# Load model tools
from scripts.model_tools import train_validate, test_validate, test_validate_confusion, set_fashion_dataset

In [None]:
# Define a transformation pipeline. 
transform = transforms.Compose(
    [
        transforms.ToTensor(),
        # ResNet pretrained model. Grayscale to RGB
        transforms.Lambda(lambda x: x.repeat(3, 1, 1))
    ]
)
train_ds, test_ds, train_loader, val_loader, test_loader, classes = set_fashion_dataset(transform, RATIO_VALIDATION, BATCH_SIZE)

### Build the model

We build upon the `models.resnet18` model. If pretrained, we initialize it with the pre-computed weights from its training. Otherwise, these weights are trained from our dataset.

In [None]:
class ResNetFeatureExtractor18(nn.Module):
    def __init__(self, pretrained = True):
        super(ResNetFeatureExtractor18, self).__init__()
        #model_resnet18 = models.resnet18(pretrained=pretrained) # pretrained is deprecated, replace with weights parameter
        weights = ResNet18_Weights.IMAGENET1K_V1 if pretrained else None
        model_resnet18 = models.resnet18(weights=weights)
        
        self.conv1 = model_resnet18.conv1
        self.bn1 = model_resnet18.bn1
        self.relu = model_resnet18.relu
        self.maxpool = model_resnet18.maxpool
        self.layer1 = model_resnet18.layer1
        self.layer2 = model_resnet18.layer2
        self.layer3 = model_resnet18.layer3
        self.layer4 = model_resnet18.layer4
        self.avgpool = model_resnet18.avgpool

    def forward(self, x):
        x = self.relu(self.bn1(self.conv1(x)))
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        x = self.avgpool(x)
        x = x.view(x.size(0), -1)

        return x

class ResClassifier(nn.Module):
    def __init__(self, dropout_p=0.5): #in_features=512
        super(ResClassifier, self).__init__()        
        self.fc = nn.Linear(512, 10)
    def forward(self, x):       
        out = self.fc(x)
        return out

# Calculate test accuracy
def test_accuracy(data_iter, netG, netF):
    """Evaluate testset accuracy of a model."""
    acc_sum,n = 0,0
    for (imgs, labels) in data_iter:
        # send data to the GPU if cuda is available
        if torch.cuda.is_available():
            imgs = imgs.cuda()
            labels = labels.cuda()
        netG.eval()
        netF.eval()
        with torch.no_grad():
            labels = labels.long()
            acc_sum += torch.sum((torch.argmax(netF(netG(imgs)), dim=1) == labels)).float()
            n += labels.shape[0]
    return acc_sum.item()/n

## Define Training for the model 

We will use `resnet18` as our feature extractor, and pass its output (bottleneck) to our classifier.

In [None]:
def train_resnet(model_resnet, model_classifier, loss_fn, optimizer_resnet, optimizer_classifier, train_loader, test_loader, n_epochs:int=10):
    if torch.cuda.is_available():
        model_resnet = model_resnet.cuda()
        model_classifier = model_classifier.cuda()
    
    for epoch in range(0, n_epochs):
        n, start = 0, time.time()
        train_l_sum = torch.tensor([0.0], dtype=torch.float32)
        train_acc_sum = torch.tensor([0.0], dtype=torch.float32)
        for i, (imgs, labels) in tqdm.tqdm(enumerate(iter(train_loader))):
            model_resnet.train()
            model_classifier.train()
            imgs = Variable(imgs)
            labels = Variable(labels)
            # train on GPU if possible  
            if torch.cuda.is_available():
                imgs = imgs.cuda()
                labels = labels.cuda()
                train_l_sum = train_l_sum.cuda()
                train_acc_sum = train_acc_sum.cuda()
    
            optimizer_resnet.zero_grad()
            optimizer_classifier.zero_grad()
    
            # extracted feature
            bottleneck = model_resnet(imgs)     
            
            # predicted labels
            label_hat = model_classifier(bottleneck)
    
            # loss function
            loss= loss_fn(label_hat, labels)
            loss.backward()
            optimizer_resnet.step()
            optimizer_classifier.step()
            
            # calcualte training error
            model_resnet.eval()
            model_classifier.eval()
            labels = labels.long()
            train_l_sum += loss.float()
            train_acc_sum += (torch.sum((torch.argmax(label_hat, dim=1) == labels))).float()
            n += labels.shape[0]
        test_acc = test_accuracy(iter(test_loader), model_resnet, model_classifier) 
        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'\
            % (epoch + 1, train_l_sum/n, train_acc_sum/n, test_acc, time.time() - start))

### Training without Pre-trained model

The training will takea bit longer (comparatively, `resnet18` is a lightweight deep neural network), but we'll just use the resnet structure to train our model from scratch.

In [None]:
netG = ResNetFeatureExtractor18(pretrained = False)
netF = ResClassifier()

In [None]:
# Setting up optimizer for both feature generator G and classifier F.
opt_g = optim.SGD(netG.parameters(), lr=LEARNING_RATE, weight_decay=0.0005)
opt_f = optim.SGD(netF.parameters(), lr=LEARNING_RATE, momentum=0.9, weight_decay=0.0005)
# Loss function
criterion = nn.CrossEntropyLoss()

In [None]:
train_resnet(netG, netF, criterion, opt_g, opt_f, train_loader, test_loader, n_epochs=EPOCHS)

## Fine Tuning a Pre-trained model

The training with pre-trained model is around 2% better than the non-pre-trained model, as Fashion is a more complicated dataset.

In [None]:
netG = ResNetFeatureExtractor18(pretrained=True)
netF = ResClassifier()

In [None]:
# Setting up optimizer for both feature generator G and classifier F.
opt_g = optim.SGD(netG.parameters(), lr=LEARNING_RATE, weight_decay=0.0005)
opt_f = optim.SGD(netF.parameters(), lr=LEARNING_RATE, momentum=0.9, weight_decay=0.0005)
# loss function
criterion = nn.CrossEntropyLoss()

In [None]:
train_resnet(netG, netF, criterion, opt_g, opt_f, train_loader, test_loader, n_epochs=EPOCHS)

**Optional Notebook: [A0-PyTorch Tutorial](A0-PyTorch%20Tutorial.ipynb)**