<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Architectures/resnet.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Residual Networks (ResNet)

## Introduction

Residual Networks (ResNets) represent a breakthrough in deep neural network design that addressed the degradation problem in very deep networks. Introduced by Kaiming He et al. from Microsoft Research in their seminal 2015 paper ["Deep Residual Learning for Image Recognition"](https://arxiv.org/abs/1512.03385), ResNets won the ILSVRC 2015 classification competition and have become one of the most influential architectures in deep learning.

## The Degradation Problem

Prior to ResNet, researchers observed that simply stacking more layers led to higher training error. This counter-intuitive phenomenon, called the degradation problem, wasn't caused by overfitting, but by the optimization difficulty in training very deep networks.

![Degradation Problem](https://miro.medium.com/max/600/1*5qUe8VDuN4Jie_6eFFpXig.png)

*Image showing training/validation error increasing with depth in plain networks*

## Key Innovation: Residual Learning

The core insight of ResNet is the introduction of **residual connections** (also called skip connections or shortcut connections). Instead of learning a direct mapping $H(x)$, the network learns the residual function $F(x) = H(x) - x$, which can be rearranged as $H(x) = F(x) + x$.

This simple yet profound change facilitates the flow of gradients through the network during backpropagation, allowing for much deeper architectures.

![Residual Block](https://miro.medium.com/max/700/1*6xp-IY-M8lEEEN_W7BpWPA.png)

*The basic residual block structure*

## ResNet Architecture

ResNet comes in various depths, with ResNet-18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152 being the most common variants (the number indicates the layer count).

### Basic Building Blocks

1. **Basic Block**: Used in shallower networks (ResNet-18, ResNet-34)
   - Two 3×3 convolutional layers with batch normalization and ReLU
   - Identity shortcut connection

2. **Bottleneck Block**: Used in deeper networks (ResNet-50+)
   - 1×1 conv for dimension reduction
   - 3×3 conv for filtering
   - 1×1 conv for dimension restoration
   - Identity shortcut connection

![ResNet Architecture](https://miro.medium.com/max/700/1*zS2ChIMwQ8x8lslF7QK9xQ.png)

*Comparing basic and bottleneck blocks*

### Overall Structure

1. Initial 7×7 convolution with stride 2
2. 3×3 max pooling with stride 2
3. Four stages of residual blocks (with doubling of channels and halving of dimensions at each stage)
4. Global average pooling
5. Fully connected layer with softmax

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

# Basic residual block implementation
class BasicBlock(nn.Module):
    expansion = 1
    
    def __init__(self, in_channels, out_channels, stride=1):
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        
        # Shortcut connection to match dimensions
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels)
            )
    
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = self.bn2(self.conv2(out))
        out += self.shortcut(x)  # Add the shortcut connection
        out = F.relu(out)        # Apply ReLU after addition
        return out

In [None]:
# Bottleneck block for deeper ResNets
class Bottleneck(nn.Module):
    expansion = 4
    
    def __init__(self, in_channels, out_channels, stride=1):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        
        # Shortcut connection
        self.shortcut = nn.Sequential()
        if stride != 1 or in_channels != out_channels * self.expansion:
            self.shortcut = nn.Sequential(
                nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * self.expansion)
            )
    
    def forward(self, x):
        out = F.relu(self.bn1(self.conv1(x)))
        out = F.relu(self.bn2(self.conv2(out)))
        out = self.bn3(self.conv3(out))
        out += self.shortcut(x)  # Add the shortcut connection
        out = F.relu(out)        # Apply ReLU after addition
        return out

In [None]:
# Complete ResNet Implementation
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=1000):
        super(ResNet, self).__init__()
        self.in_channels = 64
        
        # Initial convolution
        self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(64)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        
        # Residual blocks
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        
        # Classification head
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)
    
    def _make_layer(self, block, out_channels, num_blocks, stride):
        strides = [stride] + [1] * (num_blocks - 1)
        layers = []
        for stride in strides:
            layers.append(block(self.in_channels, out_channels, stride))
            self.in_channels = out_channels * block.expansion
        return nn.Sequential(*layers)
    
    def forward(self, x):
        x = F.relu(self.bn1(self.conv1(x)))
        x = self.maxpool(x)
        
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        
        return x

# Define specific variants of ResNet
def ResNet18():
    return ResNet(BasicBlock, [2, 2, 2, 2])

def ResNet50():
    return ResNet(Bottleneck, [3, 4, 6, 3])

def ResNet152():
    return ResNet(Bottleneck, [3, 8, 36, 3])

## Using Pre-trained ResNet Models

Modern deep learning frameworks provide pre-trained ResNet models that can be used for inference or fine-tuning.

In [None]:
# Using pre-trained ResNet in PyTorch
import torch
import torchvision.models as models
from torchvision import transforms
from PIL import Image

# Load pre-trained ResNet-50
model = models.resnet50(pretrained=True)
model.eval()

# Prepare image preprocessing
preprocess = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Example of image loading and inference
def classify_image(image_path):
    img = Image.open(image_path)
    img_t = preprocess(img)
    batch_t = torch.unsqueeze(img_t, 0)
    
    with torch.no_grad():
        output = model(batch_t)
    
    # Load ImageNet class labels
    import json
    import requests
    
    labels_URL = "https://raw.githubusercontent.com/anishathalye/imagenet-simple-labels/master/imagenet-simple-labels.json"
    labels = json.loads(requests.get(labels_URL).text)
    
    # Get predicted class
    _, index = torch.max(output, 1)
    predicted_label = labels[index.item()]
    
    return predicted_label

# Example usage (uncomment to use)
# label = classify_image('path/to/image.jpg')
# print(f"Predicted class: {label}")

## Using ResNet in TensorFlow/Keras

In [None]:
# Using pre-trained ResNet in TensorFlow/Keras
import tensorflow as tf
from tensorflow.keras.applications import ResNet50
from tensorflow.keras.applications.resnet50 import preprocess_input, decode_predictions
from tensorflow.keras.preprocessing import image
import numpy as np

# Load pre-trained ResNet-50
model = ResNet50(weights='imagenet')

# Function to classify an image
def classify_image_tf(image_path):
    img = image.load_img(image_path, target_size=(224, 224))
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    
    preds = model.predict(x)
    decoded = decode_predictions(preds, top=3)[0]
    
    return [(class_name, label, float(score)) for class_name, label, score in decoded]

# Example usage (uncomment to use)
# predictions = classify_image_tf('path/to/image.jpg')
# for _, label, score in predictions:
#     print(f"{label}: {score:.4f}")

## Variations and Extensions of ResNet

Since its introduction, many variations of ResNet have been developed:

1. **ResNeXt**: Added a cardinality dimension to ResNets, using grouped convolutions
2. **Wide ResNet**: Used wider layers (more channels) with reduced depth
3. **SE-ResNet**: Integrated Squeeze-and-Excitation blocks for adaptive feature recalibration
4. **ResNeSt**: Combined ResNeXt, squeeze-and-excitation, and multi-path representation
5. **EfficientNet**: Used neural architecture search to scale depth, width, and resolution

![ResNet Variations](https://production-media.paperswithcode.com/methods/Screen_Shot_2020-06-06_at_10.42.23_PM.png)

*Comparison of different ResNet variant block structures*

## Applications

ResNet has been widely adopted for various computer vision tasks:

1. **Image Classification**: The original purpose, achieving SOTA performance on ImageNet
2. **Object Detection**: As a backbone for Faster R-CNN, RetinaNet, etc.
3. **Semantic Segmentation**: As an encoder in U-Net, DeepLab, etc.
4. **Transfer Learning**: Pre-trained ResNets as feature extractors for downstream tasks
5. **Video Analysis**: Extended to 3D ResNets for video recognition tasks

## Impact and Legacy

ResNet has had a profound impact on deep learning:

1. It enabled training of much deeper networks, from tens to hundreds of layers
2. The concept of residual learning has been adopted in many other architectures
3. Skip connections have become a standard technique in modern network design
4. ResNet and its variants continue to serve as strong baselines for computer vision tasks
5. The insights from ResNet have influenced architectures in other domains, including NLP and speech recognition

## Further Reading

1. He, K., et al. (2015). [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). CVPR 2016.
2. He, K., et al. (2016). [Identity Mappings in Deep Residual Networks](https://arxiv.org/abs/1603.05027). ECCV 2016.
3. Xie, S., et al. (2016). [Aggregated Residual Transformations for Deep Neural Networks (ResNeXt)](https://arxiv.org/abs/1611.05431). CVPR 2017.
4. Zagoruyko, S., & Komodakis, N. (2016). [Wide Residual Networks](https://arxiv.org/abs/1605.07146). BMVC 2016.
5. Hu, J., et al. (2018). [Squeeze-and-Excitation Networks](https://arxiv.org/abs/1709.01507). CVPR 2018.