<img src = "https://github.com/exponentialR/DL4CV/blob/main/media/BMC_Summer_Course_Deep_Learning_for_Computer_Vision.jpg?raw=true" alt='BMC Summer Course' width='300'/>

# BMC Summer Course: Deep Learning for Computer Vision
## Building ResNet from Scratch: A step-by-step Guide with PyTorch

Author: Samuel A.


## 1. Introduction

In this notebook, we'll build the ResNet (Residual Network) architecture from scratch using PyTorch. ResNet is a powerful deep learning architecture that introduces the concept of residual connections, which helps in training very deep neural networks by mitigating the vanishing gradient problem.



### What You Will Learn:
- Understanding the concept of residual blocks.
- Implementing the ResNet architecture.
- Training ResNet on the CIFAR-10 dataset.
- Evaluating the model and visualizing the results.

## 2. Setup

Before we start building the ResNet model, let's make sure we have all the necessary libraries and our environment set up.


# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
import numpy as np

## 3. Understanding Residual Blocks
### Why Residual Blocks?
In very deep neural networks, as we increase the number of layers, we often encounter the problem of vanishing gradients. This issue can make the network harder to train. Residual connections are introduced to bypass this problem by allowing the gradient to flow directly through the network without diminishing.

### Residual Block Implementation
A residual block consists of two or more convolutional layers with a skip connection that bypasses these layers and adds the input directly to the output. Let's implement a basic residual block.

In [6]:
class ResidualBlock(nn.Module):
    expansion = 1
    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(ResidualBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:
            identity = self.downsample(x)

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)
        out = self.conv2(out)
        out = self.bn2(out)
        out += identity
        out = self.relu(out)

        return out

## 4. Building the ResNet Architecture
### Overview of ResNet Architecture
ResNet models are built by stacking multiple residual blocks together. There are different variants of ResNet, such as `ResNet-18`, `ResNet-34`, and `ResNet-50`, which differ in the number of layers. Here, we'll implement the ResNet-18 architecture.

### Implementation of ResNet

In [15]:
class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):  # Assuming 3 classes: cats, dogs, pandas
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),
            )

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)

        return x

def ResNet18():
    return ResNet(ResidualBlock, [2, 2, 2, 2])

## 5 Training the ResNet Model 


### Loading the Dataset
We'll use the CIFAR-10 dataset for training our ResNet model. CIFAR-10 consists of 60,000 32x32 colour images in 10 classes, with 6,000 images per class.

In [16]:
# Define the transformations for the training and validation datasets
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])

# Load CIFAR-10 dataset
train_dataset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
train_loader = DataLoader(train_dataset, batch_size=100, shuffle=True)

test_dataset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
test_loader = DataLoader(test_dataset, batch_size=100, shuffle=False)


Files already downloaded and verified
Files already downloaded and verified


### Defining the Model, Loss Function, and Optimizer

In [17]:
# Define the model, loss function, and optimizer
# setup which device to use

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f'Using device: {device}')

model = ResNet18().to(device)  # Move model to GPU or CPU
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)


Using device: cpu


### Training the Model 
Now we will train the model on the CIFAR-10 dataset.

In [18]:
# Training loop
for epoch in range(10):  # Number of epochs
    model.train()
    running_loss = 0.0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)  # Move to device

        optimizer.zero_grad()

        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
    print(f'Epoch {epoch + 1}, Loss: {running_loss / len(train_loader):.4f}')


Epoch 1, Loss: 1.3609


KeyboardInterrupt: 

## 6. Evaluating the Model
After training, we will evaluate the ResNet model on the test data to measure its accuracy.

In [None]:
# Evaluation
model.eval()
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in test_loader:
        inputs, labels = inputs.to(device), labels.to(device)  # Move to device
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the network on the test images: {100 * correct / total:.2f}%')

## 7. Visualizing 

Let's visualize some predictions made by the ResNet model on the test dataset.



In [None]:
# Visualize some test images and predictions
images, labels = next(iter(test_loader))
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)

# Plot some sample images with their predictions
fig, axes = plt.subplots(1, 5, figsize=(15, 3))
for i in range(5):
    axes[i].imshow(np.transpose(images[i].cpu().numpy(), (1, 2, 0)))
    axes[i].set_title(f'Pred: {predicted[i].item()}, True: {labels[i].item()}')
    axes[i].axis('off')
plt.show()
