<a href="https://colab.research.google.com/github/garylau1/model_training/blob/main/ResNet_from_scatach.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction

Deep learning has revolutionized the field of computer vision, and one of its cornerstone architectures is the Residual Network (ResNet). ResNet-50, a specific variant of this architecture, is widely recognized for its exceptional ability to train deep networks by addressing the vanishing gradient problem. In this project, I aim to implement ResNet-50 from scratch to gain a deeper understanding of its inner workings, layer-by-layer construction, and the overall design principles that make it so effective.

The implementation process begins with building the foundational components of ResNet-50, including the Bottleneck blocks, which are the core building blocks of the network. These blocks allow ResNet-50 to achieve remarkable depth while maintaining computational efficiency. Subsequently, I will assemble the other essential layers, such as the convolutional layers, downsampling modules, and fully connected layers, to complete the architecture.

Once the architecture is fully constructed, I will demonstrate how to integrate pretrained weights into the custom model. By using pretrained weights, the model can leverage prior knowledge gained from training on large datasets, significantly enhancing its performance and reducing the training time required for new tasks. This final step not only validates the accuracy of the implementation but also showcases the versatility of ResNet-50 when applied to practical problems.

Through this project, I aim to develop a thorough understanding of ResNet-50 and its components, while also exploring the practical aspects of transferring knowledge using pretrained weights.

In [90]:
import torch
#import the torch and nn module I need

device= "cuda" if torch.cuda.is_available() else "cpu"
from torch import nn



#The Bottleneck block

The Bottleneck block is a critical component of the ResNet-50 architecture, designed to enhance computational efficiency while maintaining expressive power. This block uses a three-layer structure with 1x1, 3x3, and 1x1 convolutional layers to reduce and restore the feature map dimensions. This approach minimizes the computational cost while retaining the ability to extract complex features.

In this implementation, the Bottleneck block supports downsampling and flexible stride configurations, making it adaptable for different stages of the ResNet architecture. Additionally, the shortcut connection allows the network to learn residual mappings, addressing the degradation problem in deep networks.

This implementation includes options for downsampling, changing the kernel size, and stride adjustments, enabling seamless integration into deeper ResNet layers.

In [275]:
class Bottleneck(nn.Module):
    """
    Implementation of the Bottleneck block for ResNet.

    The Bottleneck block is a three-layer residual block used in ResNet architectures.
    It performs dimensionality reduction and restoration using `1x1` convolutions
    while applying spatial processing with a `3x3` convolution. A shortcut
    connection is added to facilitate residual learning.

    Args:
        in_channel (int): Number of input channels.
        hidden_ (int): Number of intermediate channels (reduced dimension).
        out_channel (int): Number of output channels.
        kernel_sizes (int): Kernel size for `1x1` convolutions (default: 1).
        stride (int): Stride for convolutional layers (default: 1).
        downsample (bool): Whether to apply downsampling in the shortcut connection (default: True).
        change_kernel (bool): Whether to modify the stride in the `3x3` convolution (default: False).

    Attributes:
        conv1 (nn.Conv2d): First `1x1` convolution layer for dimensionality reduction.
        bn1 (nn.BatchNorm2d): BatchNorm layer for the first convolution.
        conv2 (nn.Conv2d): Second `3x3` convolution layer for spatial processing.
        bn2 (nn.BatchNorm2d): BatchNorm layer for the second convolution.
        conv3 (nn.Conv2d): Third `1x1` convolution layer for dimensionality restoration.
        bn3 (nn.BatchNorm2d): BatchNorm layer for the third convolution.
        relu (nn.ReLU): ReLU activation function.
        downsample (nn.Sequential): Optional downsampling shortcut connection.

    Methods:
        forward(x):
            Defines the forward pass of the Bottleneck block.
    """
    def __init__(self,
                 in_channel=256, hidden_=64, out_channel=256,
                 kernel_sizes=1, stride=1,
                 downsample=True, change_kernel=False):
        super().__init__()

        self.downsamples = downsample  # Flag for applying downsampling

        # First 1x1 convolution: reduces the number of channels (dimensionality reduction)
        self.conv1 = nn.Conv2d(
            in_channels=in_channel, out_channels=hidden_,
            kernel_size=kernel_sizes, stride=stride, bias=False
        )
        self.bn1 = nn.BatchNorm2d(hidden_, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

        # Second 3x3 convolution: applies spatial processing
        self.conv2 = nn.Conv2d(
            in_channels=hidden_, out_channels=hidden_,
            kernel_size=3, padding=1, stride=stride, bias=False
        )
        if change_kernel:  # Modify stride for downsampling in the second convolution
            self.conv2 = nn.Conv2d(
                in_channels=hidden_, out_channels=hidden_,
                kernel_size=3, padding=1, stride=2, bias=False
            )
        self.bn2 = nn.BatchNorm2d(hidden_, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

        # Third 1x1 convolution: restores the number of channels (dimensionality restoration)
        self.conv3 = nn.Conv2d(
            in_channels=hidden_, out_channels=out_channel,
            kernel_size=kernel_sizes, stride=stride, bias=False
        )
        self.bn3 = nn.BatchNorm2d(out_channel, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

        # ReLU activation: introduces non-linearity
        self.relu = nn.ReLU(inplace=True)

        # Downsampling shortcut if specified
        if self.downsamples:
            self.downsample = nn.Sequential(
                nn.Conv2d(
                    in_channels=in_channel, out_channels=out_channel,
                    kernel_size=kernel_sizes, stride=stride, bias=False
                ),
                nn.BatchNorm2d(out_channel, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
            )
            if change_kernel:  # Modify stride for downsampling in the shortcut path
                self.downsample = nn.Sequential(
                    nn.Conv2d(
                        in_channels=in_channel, out_channels=out_channel,
                        kernel_size=kernel_sizes, stride=2, bias=False
                    ),
                    nn.BatchNorm2d(out_channel, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
                )

    def forward(self, x):
        """
        Forward pass through the Bottleneck block.

        Args:
            x (torch.Tensor): Input tensor with shape (batch_size, in_channel, height, width).

        Returns:
            torch.Tensor: Output tensor after applying the Bottleneck operations.
        """
        skip_x = x  # Store the original input for the residual connection

        # Apply the three convolutional layers with BatchNorm
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.conv3(x)
        x = self.bn3(x)

        # Add the residual (shortcut) connection
        if self.downsamples:
            x = self.downsample(skip_x) + x

        x = self.relu(x)  # Apply ReLU activation to the final output
        return x

# ResNet-50 architecture from scratch

In this section, we implement the ResNet-50 architecture from scratch. ResNet-50 is a widely used deep convolutional neural network designed for image classification tasks. It is known for its ability to achieve high performance on complex datasets due to the use of residual connections that mitigate the vanishing gradient problem in deep networks.

Our implementation follows these key steps:

Initial Layers: The model begins with a convolutional layer, followed by batch normalization, ReLU activation, and max pooling, which reduce the input's spatial dimensions while capturing essential features.
Residual Layers: The core of the model consists of four main stages (layer1 to layer4). Each stage is built using Bottleneck blocks, which include shortcut connections that directly add the input to the output of a stack of convolutional layers. The number of filters increases progressively across layers, allowing the model to learn hierarchical feature representations.
Global Pooling and Classification: After the residual layers, the model applies adaptive average pooling to reduce the spatial dimensions to a fixed size. A fully connected layer maps the extracted features to class probabilities.
This design reflects the structure of the original ResNet-50 architecture. By implementing it step by step, we not only replicate its functionality but also gain a deeper understanding of its inner workings. Finally, we prepare the model to load pretrained weights, which enhances its performance on various tasks without the need for training from scratch.

In [276]:
class ResNeT_copy(nn.Module):
    """
    Implementation of the ResNet-50 architecture from scratch.

    This class builds the ResNet-50 model step by step using the following components:
    - Initial convolutional layer with BatchNorm, ReLU, and max pooling.
    - Four sequential layers (layer1 to layer4) comprising Bottleneck blocks,
      with increasing channel dimensions as the network deepens.
    - Adaptive average pooling to reduce the spatial dimensions to 1x1.
    - Fully connected (linear) layer for classification.

    Args:
        None. Default settings are used to build ResNet-50.

    Attributes:
        conv1 (nn.Conv2d): Initial convolutional layer with 64 filters of size 7x7.
        bn1 (nn.BatchNorm2d): Batch normalization layer for the initial convolution.
        relu (nn.ReLU): ReLU activation function.
        maxpool (nn.MaxPool2d): Max pooling layer to reduce spatial dimensions.
        layer1-4 (nn.Sequential): Stacked Bottleneck blocks forming the ResNet layers.
        avgpool (nn.AdaptiveAvgPool2d): Adaptive average pooling to produce a fixed-size feature map.
        fc (nn.Linear): Fully connected layer for classification into 1000 classes.

    Methods:
        forward(x):
            Defines the forward pass through the entire ResNet-50 model.
    """
    def __init__(self):
        super().__init__()

        # Initial convolutional layer: captures basic image features
        self.conv1 = nn.Conv2d(
            in_channels=3, out_channels=64,
            kernel_size=7, stride=2, padding=3, bias=False
        )
        self.bn1 = nn.BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        self.relu = nn.ReLU(inplace=True)  # Adds non-linearity
        self.maxpool = nn.MaxPool2d(3, 2, 1, dilation=1, ceil_mode=False)

        # First layer: 64 input channels, expanded to 256 in the bottleneck blocks
        self.layer1 = nn.Sequential(
            Bottleneck(64),  # First Bottleneck block with downsampling
            *[Bottleneck(downsample=False) for i in range(2)]  # Two additional blocks
        )

        # Second layer: Expands from 256 to 512 channels
        self.layer2 = nn.Sequential(
            Bottleneck(256, 128, 512, change_kernel=True),  # First block with stride 2
            *[Bottleneck(512, 128, 512, downsample=False) for i in range(3)]  # Additional blocks
        )

        # Third layer: Expands from 512 to 1024 channels
        self.layer3 = nn.Sequential(
            Bottleneck(512, 256, 1024, change_kernel=True),  # First block with stride 2
            *[Bottleneck(1024, 256, 1024, downsample=False) for i in range(5)]  # Additional blocks
        )

        # Fourth layer: Expands from 1024 to 2048 channels
        self.layer4 = nn.Sequential(
            Bottleneck(1024, 512, 2048, change_kernel=True),  # First block with stride 2
            *[Bottleneck(2048, 512, 2048, downsample=False) for i in range(2)]  # Additional blocks
        )

        # Adaptive average pooling: Reduces each feature map to 1x1
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))

        # Fully connected layer: Maps the 2048 features to 1000 classes
        self.fc = nn.Linear(2048, 1000, bias=True)

    def forward(self, x):
        """
        Forward pass through the ResNet-50 model.

        Args:
            x (torch.Tensor): Input tensor with shape (batch_size, 3, height, width).

        Returns:
            torch.Tensor: Output tensor with shape (batch_size, 1000), representing class scores.
        """
        # Initial convolutional block
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        # Pass through the ResNet layers
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        # Global average pooling
        x = self.avgpool(x)

        # Flatten and apply the fully connected layer
        x = torch.flatten(x, start_dim=1)
        x = self.fc(x)

        return x

# Verification:

To verify the functionality of the ResNet-50 implementation, we perform a simple forward pass using a test tensor. This tensor simulates an image batch with the following characteristics:

Shape: (1, 3, 224, 224):
Batch size = 1 (single image).
Channels = 3 (RGB image).
Height and Width = 224 pixels (standard input size for ResNet models).
The goal of this test is to ensure that the network processes the input tensor correctly through all layers and outputs a tensor with the expected shape (1, 1000)—representing predictions for 1000 classes (as per ImageNet classification).

In [280]:
# Test the ResNet-50 implementation with a dummy input tensor.
test_tensor = torch.ones((1, 3, 224, 224))  # Create a tensor simulating a batch of RGB images.

# Instantiate the ResNet-50 model and pass the test tensor through it.
output = ResNeT_copy()(test_tensor)

# Print the shape of the output tensor.
print(output.shape)

torch.Size([1, 1000])


In [262]:
!pip install -q torchinfo

In [281]:
# Print out the model
from torchinfo import summary
summary(model=ResNeT_copy(),input_size=(1,3,224,224),col_names=["input_size","output_size","num_params","trainable"],
        col_width=15,
        row_settings=["var_names"])

Layer (type (var_name))                  Input Shape     Output Shape    Param #         Trainable
ResNeT_copy (ResNeT_copy)                [1, 3, 224, 224] [1, 1000]       --              True
├─Conv2d (conv1)                         [1, 3, 224, 224] [1, 64, 112, 112] 9,408           True
├─BatchNorm2d (bn1)                      [1, 64, 112, 112] [1, 64, 112, 112] 128             True
├─ReLU (relu)                            [1, 64, 112, 112] [1, 64, 112, 112] --              --
├─MaxPool2d (maxpool)                    [1, 64, 112, 112] [1, 64, 56, 56] --              --
├─Sequential (layer1)                    [1, 64, 56, 56] [1, 256, 56, 56] --              True
│    └─Bottleneck (0)                    [1, 64, 56, 56] [1, 256, 56, 56] --              True
│    │    └─Conv2d (conv1)               [1, 64, 56, 56] [1, 64, 56, 56] 4,096           True
│    │    └─BatchNorm2d (bn1)            [1, 64, 56, 56] [1, 64, 56, 56] 128             True
│    │    └─Conv2d (conv2)               [1

# Load pretrained weights into our custom ResNet-50 model

In this part, we load pretrained weights into our custom ResNet-50 model. Using pretrained weights allows the model to leverage knowledge learned from large datasets (like ImageNet) without requiring extensive training from scratch. This greatly improves performance for tasks such as image classification.

Here’s what happens step by step:

Retrieve Pretrained Weights: The pretrained weights for ResNet-50 are obtained using torchvision.models.ResNet50_Weights.DEFAULT.get_state_dict(). These weights represent the parameters learned by the model during training on ImageNet.
Load Weights: We load these weights into our ResNeT_copy model using load_state_dict(). The strict=True argument ensures that the structure of our model matches exactly with the weight definitions, preventing any mismatches.

In [282]:
pretrained_weights = torchvision.models.ResNet50_Weights.DEFAULT.get_state_dict()
pretrained_weights


ResNeT_copy().load_state_dict(pretrained_weights , strict=True)

<All keys matched successfully>