# **ResNet18 From Scratch (With CIFAR-10)**

In [1]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import torch.backends.cudnn as cudnn
import torch.optim as optim
import os

## BasicBlock for ResNet Implementation

In this code, we define the `BasicBlock` class, which is the building block of a ResNet architecture. ResNet introduces residual connections, which allow the network to learn identity mappings more easily, and this class implements one such block. 

The block consists of:
1. **Two Convolutional Layers**: Both convolution layers use 3x3 kernels, and batch normalization is applied after each convolution.
2. **Residual Connection (Shortcut)**: The output from the convolutional layers is added to the input through the residual connection. This helps avoid vanishing gradients, enabling the network to train deeper models.
3. **Stride and Padding**: The stride of the first convolution can be adjusted to reduce the spatial dimensions of the feature maps. If needed, the shortcut path will also be adjusted with a 1x1 convolution to match the dimensions.

The following code implements the `BasicBlock` class, which will later be used to build the complete ResNet model.

In [2]:
class BasicBlock(nn.Module):
    def __init__(self, in_planes, planes, stride=1):
        super(BasicBlock, self).__init__()
        
        # First 3x3 convolutional layer with padding of 1 (to keep output size same as input)
        self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(planes)  # Batch Normalization after the first convolution
        
        # Second 3x3 convolutional layer with padding of 1
        self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(planes)  # Batch Normalization after the second convolution
        
        # Shortcut (skip connection). Identity mapping by default
        self.shortcut = nn.Identity()  # Default shortcut is an identity mapping (no change)
        
        # If the stride is not 1, the dimensions change, so we apply a 1x1 convolution
        # to match the dimensions of the input and output
        if stride != 1:            
            self.shortcut = nn.Sequential(
                            nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False),
                            nn.BatchNorm2d(planes)
                            )
            
    def forward(self, x):
        # First convolution followed by BatchNorm and ReLU activation
        out = F.relu(self.bn1(self.conv1(x)))
        
        # Second convolution followed by BatchNorm
        out = self.bn2(self.conv2(out))
        
        # Add the shortcut (skip connection) to the output
        out += self.shortcut(x)  # This is the residual connection
        
        # Apply ReLU activation after the addition
        out = F.relu(out)
        
        return out

## ResNet Architecture

In this code, we define the `ResNet` class, which builds the entire residual network using the `BasicBlock` as its core component. This architecture is commonly referred to as ResNet-18 (or any ResNet model depending on the number of blocks). ResNet models use **residual blocks** to allow gradients to flow more easily through the network during training, enabling the training of very deep networks.

Key Components of the `ResNet` Class:
1. **Initial Convolution Layer**: The first convolutional layer uses 64 filters with a kernel size of 3x3. The stride is set to 1, meaning the spatial dimensions of the input image remain the same.

2. **Residual Blocks**: We use the `_make_layer()` method to stack several residual blocks, where each block consists of two convolution layers with skip connections. The number of residual blocks in each layer is specified by the `num_blocks` argument.

3. **Layer Configurations**: The network has 4 layers, where the first layer maintains the input size, and the following layers reduce the spatial size of the input using a stride of 2.

4. **Global Average Pooling**: After passing through all the layers, we use global average pooling to reduce the spatial dimensions to 1x1, making the output size (batch size, 512).

5. **Fully Connected Layer**: The output of the global average pooling is passed through a fully connected layer, which maps it to the desired number of output classes (e.g., 10 for CIFAR-10).

The following code implements the complete ResNet architecture and the forward pass logic.

In [3]:
# Define the ResNet model
class ResNet(nn.Module):
    def __init__(self, block, num_blocks, num_classes=10):
        super(ResNet, self).__init__()
        self.in_planes = 64
        
        # Initial convolution layer with 64 filters (3x3 kernels)
        self.conv1 = nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1, bias=False)
        self.bn1 = nn.BatchNorm2d(64)  # Batch normalization after the first convolution
        
        # Define the layers by calling _make_layer() method for each block
        self.layer1 = self._make_layer(block, 64, num_blocks[0], stride=1)
        self.layer2 = self._make_layer(block, 128, num_blocks[1], stride=2)
        self.layer3 = self._make_layer(block, 256, num_blocks[2], stride=2)
        self.layer4 = self._make_layer(block, 512, num_blocks[3], stride=2)
        
        # Fully connected layer for final classification
        self.linear = nn.Linear(512, num_classes)
    
    def _make_layer(self, block, planes, num_blocks, stride):
        # Generate a list of strides: first block uses the given stride, rest use stride 1
        strides = [stride] + [1] * (num_blocks - 1)
        
        layers = []
        
        # Add a block for each layer
        for stride in strides:
            layers.append(block(self.in_planes, planes, stride))
            self.in_planes = planes  # Update input planes after each block
        
        return nn.Sequential(*layers)
    
    def forward(self, x):
        # Forward pass through the network
        
        # Initial convolution and batch normalization
        out = F.relu(self.bn1(self.conv1(x)))
        
        # Pass through the layers
        out = self.layer1(out)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        
        # Global average pooling to reduce dimensions
        out = F.avg_pool2d(out, 4)
        
        # Flatten the output to feed into the fully connected layer
        out = out.view(out.size(0), -1)  # Flatten the tensor
        
        # Final fully connected layer for classification
        out = self.linear(out)
        
        return out

## Importing and Preparing the CIFAR-10 Dataset

In this section, we will load the CIFAR-10 dataset, a commonly used dataset for training image classification models. CIFAR-10 consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. The dataset is divided into a training set of 50,000 images and a test set of 10,000 images.

We will use the `torchvision` library to load and preprocess the CIFAR-10 dataset.

### Data Augmentation and DataLoader Setup for CIFAR-10

We enhance the preprocessing pipeline for the CIFAR-10 dataset by adding data augmentation for the training set. Data augmentation helps prevent overfitting and makes the model more robust by introducing randomness during training.

Key Augmentations:
1. **Random Crop**: The `RandomCrop` transformation randomly crops a 32x32 section of the image, with a padding of 4 pixels. This introduces slight variations in the input data, enhancing the model's ability to learn from different image sections.
2. **Random Horizontal Flip**: The `RandomHorizontalFlip` transformation randomly flips the image horizontally with a 50% probability. This makes the model more invariant to left-right orientations of objects in the images.
3. **ToTensor**: Converts the images into PyTorch tensors, making them compatible with the neural network.

For the test set, we only apply the `ToTensor()` transformation without any augmentation since we want the test set to represent the real-world data distribution.

After applying these transformations, we load the CIFAR-10 dataset into PyTorch `DataLoader` objects, which handle batching, shuffling, and parallel data loading efficiently.

In [4]:
train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),  # Apply random cropping with padding
    transforms.RandomHorizontalFlip(),      # Randomly flip the image horizontally
    transforms.ToTensor(),                  # Convert image to tensor
])

test = transforms.Compose([
    transforms.ToTensor(),  # Convert image to tensor (no augmentation for test data)
])

# Load CIFAR-10 training and test datasets with the defined transformations
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=train)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=test)

# Create DataLoader for training data (batch size 128, shuffling, using 4 worker threads)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=128, shuffle=True, num_workers=4)

# Create DataLoader for test data (batch size 100, no shuffling)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100, shuffle=False, num_workers=4)

Files already downloaded and verified
Files already downloaded and verified


### Defining the ResNet-18 Model

The `ResNet18()` function initializes and returns a ResNet-18 model. ResNet-18 is a specific configuration of the ResNet architecture with 18 layers, designed to work well on a variety of image classification tasks, such as CIFAR-10. 

1. **BasicBlock**:  
   - The ResNet-18 architecture is based on the `BasicBlock` building block, which is a simple residual block containing two convolutional layers with batch normalization.
   
2. **Layer Configuration**:  
   - The `[2, 2, 2, 2]` argument specifies the number of `BasicBlock` units in each of the four stages of the network:
     - 2 blocks in the first stage
     - 2 blocks in the second stage
     - 2 blocks in the third stage
     - 2 blocks in the fourth stage
     
   This structure allows the network to progressively learn more complex representations, with deeper layers capturing higher-level features.

This function is called to create the ResNet-18 model instance used for training and testing.

In [5]:
def ResNet18():
    return ResNet(BasicBlock, [2,2,2,2])

### Setting up the Training Configuration for ResNet-18 on CIFAR-10

In this section, we prepare the training configuration for training the ResNet-18 model on the CIFAR-10 dataset. We will be using the following settings:

1. **Device Configuration**: 
   - We specify that the model will run on a GPU (CUDA). 
   - `cudnn.benchmark = True` is set to optimize performance for the current input sizes, which is beneficial when the input sizes are fixed and known.

2. **Model Setup**: 
   - We instantiate the `ResNet18` model and move it to the GPU.
   - We use `torch.nn.DataParallel()` to enable parallelism if there are multiple GPUs available.

3. **Learning Rate**: 
   - The learning rate is initially set to 0.1. This will be used by the optimizer to update the model's weights during training.

4. **Loss Function**:
   - We use **Cross Entropy Loss**, which is suitable for multi-class classification tasks like CIFAR-10.

5. **Optimizer**:
   - We use the **SGD (Stochastic Gradient Descent)** optimizer with:
     - **Momentum** of 0.9 to help accelerate convergence by considering previous gradients.
     - **Weight decay** (L2 regularization) of 0.0002 to reduce overfitting.

The model will be saved to a file called `resnet18_cifar10.pth` after training.

In [6]:
# Set device to GPU if available
device = 'cuda' if torch.cuda.is_available() else 'cpu'
cudnn.benchmark = True  # Optimizes for a fixed input size

# Initialize the ResNet-18 model and move it to the GPU
net = ResNet18()
net = net.to(device)

# Wrap the model in DataParallel for multi-GPU support (if available)
net = torch.nn.DataParallel(net)

# Define learning rate and file name for saving the model
learning_rate = 0.1
file_name = 'resnet18_cifar10.pth'

# Define the loss function (Cross Entropy Loss for classification)
criterion = nn.CrossEntropyLoss()

# Define the optimizer (SGD with momentum and weight decay)
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9, weight_decay=0.0002)

### Training Function for ResNet-18 on CIFAR-10

The `train()` function defines the training loop for one epoch of the ResNet-18 model on the CIFAR-10 dataset. Here is a step-by-step breakdown of what happens:

1. **Model Training Mode**:
   - `net.train()` sets the model to training mode. This is necessary for layers like dropout or batch normalization, which behave differently during training vs. evaluation.

2. **Batch-wise Processing**:
   - We iterate over the `train_loader`, which loads batches of images and their corresponding labels.
   - Each batch of images (`inputs`) and labels (`targets`) is moved to the GPU (`inputs = inputs.to(device)` and `targets = targets.to(device)`).

3. **Forward Pass**:
   - The model performs a forward pass to compute the outputs (predictions) for the current batch using `outputs = net(inputs)`.

4. **Loss Calculation**:
   - The loss is computed using the `criterion` (cross-entropy loss in this case) by comparing the model's predictions (`outputs`) with the true labels (`targets`).

5. **Backpropagation and Optimization**:
   - `optimizer.zero_grad()` clears the gradients from the previous step.
   - `loss.backward()` computes the gradients of the loss with respect to the model's parameters.
   - `optimizer.step()` updates the model's parameters using the computed gradients.

6. **Tracking Metrics**:
   - The training loss (`train_loss`) and the number of correct predictions (`correct`) are accumulated for each batch.
   - Every 100th batch, the function prints the current training accuracy and loss for monitoring.

7. **Epoch Summary**:
   - After processing all batches in the epoch, the function prints the overall training accuracy and loss.

This function allows you to track the model's progress during training, including the per-batch accuracy and loss, as well as the final accuracy and loss for the entire epoch.

In [7]:
def train(epoch):
    print('\n[ Train epoch: %d ]' % epoch)
    net.train()  # Set the model to training mode
    train_loss = 0
    correct = 0
    total = 0
    
    # Loop through the training data
    for batch_idx, (inputs, targets) in enumerate(train_loader):
        inputs = inputs.to(device)  # Move inputs to the GPU
        targets = targets.to(device)  # Move targets to the GPU
        
        outputs = net(inputs)  # Perform a forward pass through the model
        
        optimizer.zero_grad()  # Reset the gradients from the previous step
        loss = criterion(outputs, targets)  # Compute the loss
        loss.backward()  # Backpropagate the loss
        optimizer.step()  # Update the model weights
        
        train_loss += loss.item()  # Accumulate the training loss
        
        _, predicted = outputs.max(1)  # Get the predicted labels (index of max logit)
        
        total += targets.size(0)  # Update the total number of samples processed
        correct += predicted.eq(targets).sum().item()  # Count correct predictions
        
        if batch_idx % 100 == 0:  # Print progress every 100 batches
            print('\nCurrent batch:', str(batch_idx))
            print('Current benign train accuracy:', str(predicted.eq(targets).sum().item() / targets.size(0)))
            print('Current benign train loss:', loss.item())

    # Print the overall training results for the epoch
    print('\nTotal benign train accuracy:', 100. * correct / total)
    print('Total benign train loss:', train_loss)

### Testing Function for ResNet-18 on CIFAR-10

The `test()` function evaluates the ResNet-18 model on the CIFAR-10 test dataset. This function performs the following tasks:

1. **Model Evaluation Mode**:
   - `net.eval()` sets the model to evaluation mode. This ensures that layers like dropout or batch normalization behave correctly during testing (i.e., no dropout and the use of running statistics for batch normalization).

2. **Batch-wise Processing**:
   - The function iterates over the `test_loader`, which provides batches of test images and their corresponding labels.
   - Each batch of images (`inputs`) and labels (`targets`) is moved to the GPU (`inputs.to(device)` and `targets.to(device)`).

3. **Forward Pass**:
   - The model performs a forward pass on the inputs to compute predictions (`outputs = net(inputs)`).

4. **Loss and Accuracy Calculation**:
   - The loss for the batch is calculated using the `criterion` (cross-entropy loss), and the total loss is accumulated.
   - The predicted class labels are computed by finding the index of the maximum value in the output logits (`_, predicted = outputs.max(1)`).
   - The number of correct predictions is counted and accumulated.

5. **Epoch Summary**:
   - After processing all batches in the test set, the function prints the overall test accuracy and average loss.

6. **Saving the Model**:
   - The model's state dictionary (which contains the learned weights) is saved to a file in a folder named `checkpoint`. This allows you to load the model later for inference or further training.

This function allows you to evaluate the model's performance on the test set and save the trained model for later use.

In [8]:
def test(epoch):
    print('\n[ Test epoch: %d ]' % epoch)
    net.eval()  # Set the model to evaluation mode
    loss = 0
    correct = 0
    total = 0

    # Loop through the test data
    for batch_idx, (inputs, targets) in enumerate(test_loader):
        inputs, targets = inputs.to(device), targets.to(device)  # Move to GPU
        total += targets.size(0)  # Update total number of samples processed

        outputs = net(inputs)  # Perform a forward pass through the model
        loss += criterion(outputs, targets).item()  # Accumulate loss

        _, predicted = outputs.max(1)  # Get the predicted labels (index of max logit)
        correct += predicted.eq(targets).sum().item()  # Count correct predictions

    # Print the final test results for the epoch
    print('\nTest accuracy:', 100. * correct / total)
    print('Test average loss:', loss / total)

    # Save the model's state
    state = {
        'net': net.state_dict()
    }
    if not os.path.isdir('checkpoint'):
        os.mkdir('checkpoint')  # Create 'checkpoint' directory if it doesn't exist
    torch.save(state, './checkpoint/' + file_name)  # Save the model
    print('Model Saved!')

### Learning Rate Adjustment Function

The `adjust_learning_rate()` function dynamically adjusts the learning rate during training based on the epoch number. This is a common strategy to improve training stability and convergence as the model approaches an optimal solution.

1. **Initial Learning Rate**: 
   - The function starts with an initial learning rate defined by `learning_rate`.

2. **Learning Rate Scheduling**:
   - If the epoch number reaches 100, the learning rate is reduced by a factor of 10. This is a typical approach to decrease the learning rate after a certain number of epochs, allowing the model to refine its weights more carefully.
   - If the epoch number reaches 150, the learning rate is again reduced by another factor of 10. This helps the model converge more precisely as training progresses.

3. **Update the Optimizer**:
   - The learning rate is updated for all parameter groups in the optimizer using `optimizer.param_groups`. This ensures that the optimizer uses the adjusted learning rate during the next step.

This strategy of reducing the learning rate over time is often called **Step Decay** and can help the model reach a better minimum in the loss landscape.

In [9]:
def adjust_learning_rate(optimizer, epoch):
    lr = learning_rate  # Start with the initial learning rate
    if epoch >= 100:    # After 100 epochs, reduce learning rate by 10x
        lr /= 10
    if epoch >= 150:    # After 150 epochs, reduce learning rate by another 10x
        lr /= 10
    
    # Update the learning rate for all parameter groups in the optimizer
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

### Training and Testing the Model for 20 Epochs

In this section, we run the training and testing loops for 20 epochs. During each epoch:

1. **Learning Rate Adjustment**:  
   - The learning rate is adjusted according to the epoch number using the `adjust_learning_rate()` function. This ensures that the learning rate decreases at specific milestones (100 and 150 epochs) to allow for more fine-grained updates towards the end of the training.

2. **Training Loop**:  
   - The `train()` function is called to train the model on the training dataset for one epoch. This function computes the loss, updates the model weights, and tracks training accuracy.

3. **Testing Loop**:  
   - The `test()` function is called to evaluate the model on the test dataset after each epoch. This function computes the loss and accuracy on the test set and saves the model after each test.

This loop will train and evaluate the model for a total of 20 epochs, helping us monitor the performance of the model over time.

In [10]:
# Train and test the model for 20 epochs
for epoch in range(0, 20):
    # Adjust the learning rate based on the epoch number
    adjust_learning_rate(optimizer, epoch)
    
    # Train the model for the current epoch
    train(epoch)
    
    # Test the model after the current epoch
    test(epoch)


[ Train epoch: 0 ]

Current batch: 0
Current benign train accuracy: 0.078125
Current benign train loss: 2.51173996925354

Current batch: 100
Current benign train accuracy: 0.2890625
Current benign train loss: 1.8501043319702148

Current batch: 200
Current benign train accuracy: 0.28125
Current benign train loss: 1.7279845476150513

Current batch: 300
Current benign train accuracy: 0.3828125
Current benign train loss: 1.591795802116394

Total benign train accuracy: 32.708
Total benign train loss: 721.3459944725037

[ Test epoch: 0 ]

Test accuracy: 39.27
Test average loss: 0.017211592411994932
Model Saved!

[ Train epoch: 1 ]

Current batch: 0
Current benign train accuracy: 0.4375
Current benign train loss: 1.477840542793274

Current batch: 100
Current benign train accuracy: 0.46875
Current benign train loss: 1.5304198265075684

Current batch: 200
Current benign train accuracy: 0.5390625
Current benign train loss: 1.3074949979782104

Current batch: 300
Current benign train accuracy: 0.