# Computer Vision & Convolutional Neural Network
---
    A Convolutional Neural Network (ConvNet/CNN) is a Deep Learning algorithm that can take in an input image, assign importance (learnable weights and biases) to various aspects/objects in the image, and be able to differentiate one from the other.
    
    CNN power image recognition and computer vision tasks. Computer vision is a field of artificial intelligence (AI) that enables computers and systems to derive meaningful information from digital images, videos and other visual inputs, and based on those inputs, it can take action.

    Common Uses: Image and video recognition, image classification, object detection, segmentation, and ...
    
    In CNN architecture, there are three main types of layers, which are:

        - Convolutional layer
        - Pooling layer
        - Fully-connected (FC) layer
        
    The convolutional layer is the first layer of a convolutional network. While convolutional layers can be followed by additional convolutional layers or pooling layers, the fully-connected layer is the final layer. 
    
    With each layer, the CNN increases in its complexity, identifying greater portions of the image. Earlier layers focus on simple features, such as colors and edges. 
    
    As the image data progresses through the layers of the CNN, it starts to recognize larger elements or shapes of the object until it finally identifies the intended object.

    List of famous architectures that we will be covering in this video:
        1- LeNet  2- AlexNet  3- VGGNet  4- ResNet

>![image.png](attachment:a586e038-f50e-45d7-8a78-f38378224faa.png)

## 1- Convolution Layer

    هر لایه از چندین فیلتر (یا هسته) تشکیل شده که برای شناسایی ویژگی‌های مختلف تصویر استفاده می‌شوند. این فیلترها روی تصویر حرکت می‌کنند و  ویژگی‌های مهم مانند لبه‌ها، بافت‌ها و الگوهای محلی را استخراج می‌کنند


>nn.Conv1d: Operates on 1D data (e.g., sequences, Time series analysis, audio processing, text processing.).
>
>nn.Conv2d: Operates on 2D data (e.g., Image processing, computer vision.).
>
>nn.Conv3d: Operates on 3D data (e.g., volumetric data, Medical imaging, video analysis, 3D object recognition.).
     
    torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)

        Parameters
        - in_channels (int) – Number of channels in the input image
        - out_channels (int) – Number of channels produced by the convolution
        - kernel_size (int or tuple) – Size of the convolving kernel
        - stride (int or tuple, optional) – Stride of the convolution. Default: 1
        - padding (int, tuple or str, optional) – Padding added to all four sides of the input. Default: 0
        - padding_mode (str, optional) – 'zeros', 'reflect', 'replicate' or 'circular'. Default: 'zeros'
        - dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
        - groups (int, optional) – Number connections from input to output channels. Default: 1
        - bias (bool, optional) – If True, adds a learnable bias to the output. Default: True

        


>![image.png](attachment:0eb91f4c-bfad-416c-92a0-bd9adfb4bd86.png)


> ##  Activation map (Feature map)
    After passing through a convolutional layer, the image becomes abstracted to a feature map, also called an activation map.
> ![image.png](attachment:3df238ed-18b8-46f3-908b-aeca29d1e145.png)
>


In [None]:
from IPython.display import Image
Image(url='convolution.gif')


> ### فیلترها - Filter (Kernels)
    فیلتر (یا هسته) یک ماتریس کوچک از اعداد است که بر روی تصویر اعمال می‌شود تا ویژگی‌های خاصی را استخراج کند. فیلترها معمولاً ابعاد کوچکتری نسبت به تصویر ورودی دارند. هر فیلتر یک ویژگی خاص را در تصویر شناسایی می‌کند، مانند لبه‌ها، زوایا، یا الگوهای خاص 


> ![image.png](attachment:c527ea24-dff4-4765-9732-70bbbe4056a2.png)

> ### : نقشه ویژگی‌ها - (Feature Map)
    زمانی که یک فیلتر بر روی تصویر اعمال می‌شود، نتایج به دست آمده در یک ماتریس جدید ذخیره می‌شود که به آن نقشه ویژگی گفته می‌شود 

> ### : تنظیم فیلترها
    وزن‌های فیلترها (اعداد داخل فیلتر) در طی فرآیند آموزش بهینه‌سازی می‌شوند. شبکه یاد می‌گیرد که بهترین وزن‌ها را برای شناسایی ویژگی‌های مهم در تصاویر تنظیم کند.

> ### Stride 
        استراید به معنای فاصله‌ای است که هسته (فیلتر) در طول تصویر حرکت می‌کند. به بیان ساده، استراید مشخص می‌کند که هر بار که هسته روی تصویر اعمال می‌شود، چند پیکسل به جلو می‌رود. استراید معمولاً یک مقدار ثابت است، اما می‌تواند مقادیر مختلفی داشته باشد.

     هرچه استراید بزرگ‌تر باشد، ابعاد خروجی کوچک‌تر خواهد بود و برعکس
     
> Stride is the number of pixels shifts over the input matrix.
![image.png](attachment:6ec2d960-e707-460d-b176-ff4a1aa68777.png) 




> Convolutional layers slides a set of ‘filters’ or ‘kernels’ across the input data.
> 
> Filter = 3x3   Stride = 1
> 
> ![image.png](attachment:0cb1fed4-c4af-440b-a7db-34a28c195d5d.png) 

In [None]:

Image(url='convolution_rgb.gif')

> ### Padding

    پدینگ به معنای اضافه کردن پیکسل‌های اضافی در اطراف تصویر ورودی است. این پیکسل‌های اضافی می‌توانند مقادیر صفر یا دیگر مقادیر ثابت باشند. پدینگ معمولاً برای حفظ ابعاد تصویر ورودی پس از عملیات کانولوشن استفاده می‌شود

    :پدینگ صفر - (Zero Padding) 
    در این نوع پدینگ، مقادیر صفر به اطراف تصویر ورودی اضافه می‌شوند. این کار باعث می‌شود که ابعاد تصویر پس از اعمال عملیات پیچشی تغییر نکند


    
  >  Padding preserves the size of the original image.
> ![image.png](attachment:766fc50a-1784-42a9-8361-66af7d06d2d1.png) 
> ![image.png](attachment:211bbf40-a350-4894-b70a-36d58ab64e5d.png) 

# 2- Pooling Layer
    این لایه به کاهش ابعاد و حفظ ویژگی‌های مهم کمک می‌کند. این لایه نقش مهمی در بهبود کارایی و عملکرد شبکه‌های عصبی کانولوشن دارد 
    
    - Max Pooling returns the maximum value covered by the Kernel
    - Average Pooling returns the average of all the values covered by the Kernel
           
            nn.MaxPool2d(kernel_size = 2, stride = 2)
            nn.AvgPool2d(kernel_size = 2, stride = 2)
>![image.png](attachment:cc601434-7a06-4a6f-bccb-25a3c3eed955.png)
>![image.png](attachment:ec94e4bd-9f48-4701-a40c-e991a68b2ef8.png)

# 3- Fully Connected Layer (Classification)
    لایه کاملاً متصل یا لایه متراکم، لایه‌ای است که در آن هر نورون به تمامی نورون‌های لایه قبلی متصل می‌شود. این اتصال کامل به نورون‌ها امکان می‌دهد تا تمامی اطلاعات موجود در لایه قبلی را در نظر بگیرند و یک خروجی نهایی ایجاد کنند 

>![image.png](attachment:5795756e-993a-4786-9fbf-dc595b60b049.png)
>


>![image.png](attachment:ceb70b74-619d-4938-b4ca-282eacf135b0.png)
>

## Input (Feature) Size Calculation:

    ((INPUT SIZE - FILTER SIZE + 2*PADDING) / STRIDE ) + 1
    
    Example: 
    
    Model: VGG16
    input= 224x224
    
    conv = filter =3x3 , padding = 1 stride = 1
    pool = filter =2x2 , padding = 0 stride = 2
    
    conv --> 224x224 --->  ((224 - 3 + 2)/ 1 ) + 1 = (223) + 1 = 224 ---> 224x224
    pool --> 224x224 --->  ((224 - 2 + 0)/ 2 ) + 1 = (111) + 1 = 112 ---> 112x112
    

In [None]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.nn.functional as F
from datetime import datetime

# CIFAR10 dataset. 
    The CIFAR-10 dataset (Canadian Institute For Advanced Research) is a collection of images that are commonly used to train deep learning algorithms.

    - It contains 60,000 color images of size 32x32 for 10 different classes.
    - 10 different classes are: airplane, car, bird, cat, deer, dog, frog, horse, ship, and truck. There are 6,000 images of each class.

https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html

In [None]:
# device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# Hyperparameters
num_epochs = 25
batch_size = 64
learning_rate = 0.001
# Data transforms
transform_train = transforms.Compose([
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(32, padding=4),
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),])

# Data  
train_dataset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
test_dataset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
# Data loader
train_loader = torch.utils.data.DataLoader(train_dataset,batch_size = batch_size, shuffle =True)
test_loader  = torch.utils.data.DataLoader(test_dataset,batch_size  = batch_size, shuffle =False)


In [None]:
# dataiter = iter(torchvision.datasets.CIFAR10( root = './data', train = True, download = False ))
# images, labels = next(dataiter)
# images

In [None]:

def train_model(model, train_loader, num_epochs,  learning_rate ,device):
    # Get the total number of steps (batches) in the training loader
    total_step = len(train_loader)
 
    # Move the model to the specified device (CPU or GPU)
    model.to(device)
    
    # Define the loss function (Cross-Entropy Loss for classification)
    criterion = nn.CrossEntropyLoss()
    
    # Define the optimizer (Adam optimizer)
    optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, momentum=0.9)


    # List to store the loss values
    all_loss = []

    # Get the current time for tracking the start time
    start_time = datetime.now()
    start_datetime = start_time.strftime('%Y-%m-%d %H:%M:%S')
    
    # Print the start time
    print('Start time:', start_datetime)
    
    # Loop over the number of epochs
    for epoch in range(num_epochs):
        # Set the model to training mode
        model.train()
        
        # List to store the loss values for the current epoch
        epoch_loss = []
        
        print('Started Training... Epoch:', epoch+1)
        print('-'*50)
        # Loop over the training data loader
        for i, (images, labels) in enumerate(train_loader):
            # Move the images and labels to the specified device
            images = images.to(device)
            labels = labels.to(device)

            # Forward pass: compute the model output
            outputs = model(images)
            
            # Compute the loss
            loss = criterion(outputs, labels)
            
            # Store the scalar value of the loss
            epoch_loss.append(loss.item())
            
            # Backward pass and optimization
            optimizer.zero_grad()  # Clear the gradients
            loss.backward()        # Compute gradients
            optimizer.step()       # Update weights
            
            # Print the loss for every 200th step
            if (i + 1) % 200 == 0:
                print(f'Epoch [{epoch + 1}/{num_epochs}], Step [{i + 1}/{total_step}], Loss: {loss.item():.4f}')
        
        # Extend the all_loss list with the loss values of the current epoch
        all_loss.extend(epoch_loss)
    
    # Get the current time for tracking the end time
    end_time = datetime.now()
    end_datetime = end_time.strftime('%Y-%m-%d %H:%M:%S')
    
    # Calculate the duration of the training
    epoch_duration = end_time - start_time
    
    # Print the end time and the duration of the training
    print('End time:', end_datetime)
    print('-'*50)
    print('Finished Training. Duration:', epoch_duration)
    print('-'*50)

    # Return the list of all loss values and the trained model
    return all_loss, model


In [None]:
def test_model(model, test_loader, device):
    # Set the model to evaluation mode to turn off dropout, batch normalization, etc.
    model.eval()
 
    # Initialize counters for overall accuracy
    n_correct = 0
    n_samples = 0
    
    # Initialize counters for class-wise accuracy
    n_class_correct = [0 for _ in range(10)]
    n_class_samples = [0 for _ in range(10)]
    
    # Define the class names for easier readability
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    
    # Disable gradient calculation since we are in inference mode
    with torch.no_grad():
        # Iterate over the test data loader
        for images, labels in test_loader:
            # Move the images and labels to the specified device (CPU or GPU)
            images = images.to(device)
            labels = labels.to(device)
            
            # Perform the forward pass to get model predictions
            outputs = model(images)
            
            # Get the predicted class by taking the index with the highest score
            _, predicted = torch.max(outputs, 1)
            
            # Update the overall sample and correct prediction counters
            n_samples += labels.size(0)
            n_correct += (predicted == labels).sum().item()
            
            # Update the class-wise correct prediction counters
            for i in range(labels.size(0)):
                label = labels[i]
                pred = predicted[i]
                if label == pred:
                    n_class_correct[label] += 1
                n_class_samples[label] += 1
    
    # Calculate and print the overall accuracy
    overall_acc = 100.0 * n_correct / n_samples
    print(f'Overall accuracy: {overall_acc:.2f} %')
    print('-'*25)
    # Calculate and print the accuracy for each class
    for i in range(10):
        if n_class_samples[i] > 0:
            class_acc = 100.0 * n_class_correct[i] / n_class_samples[i]
            print(f'Accuracy of {classes[i]}: {class_acc:.2f} %')
        else:
            print(f'Accuracy of {classes[i]}: No samples')


In [None]:
def plot_losses(losses, labels, title=None, ymin=0, ymax=None, figsize=(15, 5)):
    """
    Plots the losses from multiple experiments.
    
    Parameters:
        losses (list of lists): A list containing lists of loss values from different experiments.
        labels (list of str): A list of labels corresponding to each set of loss values.
        title (str, optional): The title of the plot.
        ymin (float, optional): The minimum y-axis value.
        ymax (float, optional): The maximum y-axis value.
        figsize (tuple, optional): The size of the figure.
    """
    
    # Create a figure and axis with the specified size
    fig, ax = plt.subplots(figsize=figsize)
    
    # Plot each set of loss values with its corresponding label
    for loss, label in zip(losses, labels):
        ax.plot(loss, label=label)
    
    # Set the title of the plot if provided
    if title:
        ax.set_title(title)
    
    # Set the y-axis label
    ax.set_ylabel('Loss')
    
    # Set the x-axis label
    ax.set_xlabel('Update Steps')
    
    # Set the limits for the y-axis if provided
    ax.set_ylim(ymin=ymin, ymax=ymax)
    
    # Add a grid to the plot for better readability
    ax.grid()
    
    # Add a legend to the plot to identify each line
    ax.legend(loc='upper right')
    


# LeNet-5 (1998)


![image.png](attachment:6c240cf2-5886-41ce-848a-fbb270285c11.png)

## Fully Connected Layer Input Size Calculation:

    ((INPUT SIZE - FILTER SIZE + 2*PADDING) / STRIDE ) + 1
    
        #       ((INPUT SIZE - FILTER SIZE + 2 PADDING) / STRIDE  ) + 1
        #image -> 32 x 32
        #conv1 -> 32 x 32    -      5      +   0      ) /     1   ) + 1 =  28
        #pool  -> 28 x 28    -      2      +   0      ) /     2   ) + 1 =  14
        #conv2 -> 14 x 14    -      5      +   0      ) /     1   ) + 1 =  10
        #pool  -> 10 x 10    -      2      +   0      ) /     2   ) + 1 =  5 
        #      -> 5  x 5   

In [None]:
class CNN(nn.Module):
    def __init__(self):
        super().__init__()
         
        # conv1: input is 3 channels (e.g., RGB image) and outputs 6 feature maps. The kernel size is 5x5.
        self.conv1 = nn.Conv2d(in_channels = 3, out_channels= 6, kernel_size = 5)
        
        # Max-pooling layer with a 2x2 window and stride of 2
        self.pool = nn.MaxPool2d(kernel_size = 2, stride = 2)
         
        # conv2: Takes the 6 feature maps from conv1 and outputs 16 feature maps. The kernel size is 5x5.
        self.conv2 = nn.Conv2d(in_channels = 6, out_channels= 16, kernel_size= 5)
        
        # Fully connected layer: input size 16*5*5 = 400 (flattened from previous output), output size 120
        self.fc1 = nn.Linear(400 , 120)
        
        # Fully connected layer: input size 120, output size 84
        self.fc2 = nn.Linear(120 ,  84)
        
        # Fully connected layer: input size 84, output size 10 (number of classes)
        self.fc3 = nn.Linear(84  ,  10)

       
    def forward(self, x):
        # Apply first convolution, ReLU activation, and max-pooling
        out = self.pool(F.relu(self.conv1(x)))
        
        # Apply second convolution, ReLU activation, and max-pooling
        out = self.pool(F.relu(self.conv2(out)))

        #       ((INPUT SIZE - FILTER SIZE + 2 PADDING) / STRIDE  ) + 1
        #image -> 32 x 32
        #conv1 -> 32 x 32    -      5      +   0      ) /     1   ) + 1 =  28
        #pool  -> 28 x 28    -      2      +   0      ) /     2   ) + 1 =  14
        #conv2 -> 14 x 14    -      5      +   0      ) /     1   ) + 1 =  10
        #pool  -> 10 x 10    -      2      +   0      ) /     2   ) + 1 =  5 
        #      -> 5  x 5                        
        
        # changed size is 16,5,5 = 400 --> using ^^^ calculation
        
        # Flatten the tensor for the fully connected layers
        out = out.view(-1 , 400)    # (batch_size , 400)
        
        # Apply first fully connected layer and ReLU activation
        out = F.relu(self.fc1(out)) #400 -> 120
        
        # Apply second fully connected layer and ReLU activation
        out = F.relu(self.fc2(out)) #120 -> 84
        
        # Apply third fully connected layer to get the final output
        out = self.fc3(out)          #84 -> 10
        
        return out


In [None]:
# Instantiate your model
model = CNN().to(device)

# Train the model
cnn_loss, trained_model = train_model(model, train_loader, learning_rate=learning_rate, num_epochs=num_epochs, device=device)

# Test the model
test_model(trained_model, test_loader, device=device)


In [None]:
plot_losses([cnn_loss], ['Simple CNN'])

# Batch Norm Layer:
 
    نرمال‌سازی دسته‌ای یک تکنیک برای استانداردسازی ورودی‌های هر لایه در شبکه عصبی است. این کار باعث می‌شود که میانگین و واریانس ورودی‌ها به هر لایه ثابت بماند، که به شبکه کمک می‌کند تا سریع‌تر و با دقت بیشتری یاد بگیرد 
    
>### چرا نرمال‌سازی دسته‌ای مهم است؟
    تسریع یادگیری: نرمال‌سازی دسته‌ای باعث می‌شود که شبکه بتواند سریع‌تر یاد بگیرد 
    پایداری آموزش: این تکنیک از مشکلاتی مانند ناپایداری در گرادیان‌ها جلوگیری می‌کند 

     is a technique that normalizes the input of each layer within a mini-batch by subtracting the batch mean and dividing by the batch standard deviation.
     
> ![image.png](attachment:7d7d9ab3-f555-4b47-b16a-f2ac074f1563.png) 

    There are two opinions for where the Batch Norm layer should be placed in the architecture — before and after activation. The original paper placed it before, some say ‘after’ gives better results.

> ![image.png](attachment:8c801a45-4775-4495-8634-a6f57848d310.png) 

In [None]:
# Define the  CNN with Batch Norm
class CNN_BN(nn.Module):
    def __init__(self):
        super().__init__()
        # RGB --> 32 feature maps
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        # 32 feature maps --> 64 feature maps
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        # 64 feature maps --> 128 feature maps
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
              
        self.fc1 = nn.Linear(128 * 4 * 4, 128)
        self.fc2 = nn.Linear(128, 10)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
 #              ((INPUT SIZE - FILTER SIZE + 2 PADDING) / STRIDE  ) + 1
        #image -> 32 x 32
        #conv1 ->      32    -      3      +   2      ) /     1   ) + 1 =  32
        #pool  ->      32    -      2      +   0      ) /     2   ) + 1 =  16
        #conv2 ->      16    -      3      +   2      ) /     1   ) + 1 =  16
        #pool  ->      16    -      2      +   0      ) /     2   ) + 1 =  8 
        #conv3 ->       8    -      3      +   2      ) /     1   ) + 1 =  8
        #pool  ->       8    -      2      +   0      ) /     2   ) + 1 =  4
        #      -> 4 x 4  
        
        x = x.view(-1, 128 * 4 * 4)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
 

In [None]:
# Instantiate your model
model2 = CNN_BN().to(device)

# Train the model
cnn_bn_loss, trained_model2 = train_model(model2, train_loader, learning_rate=learning_rate,num_epochs=num_epochs, device=device)

# Test the model
test_model(trained_model2, test_loader, device=device)


In [None]:
plot_losses([cnn_loss,cnn_bn_loss], ['CNN','CNN With BN'])

# Dropout Regularization :
    is a regularization technique, used to prevent overfitting, where you remove (or "drop out") units in a neural net to simulate training large numbers of architectures simultaneously. Importantly, dropout can drastically reduce the chance of overfitting during training. 

    torch.nn.Dropout(p=0.5, inplace=False)
    
> p (float) – probability of an element to be zeroed. Default: 0.5
> 
> inplace (bool) – If set to True, will do this operation in-place. Default: False

## During Training
    - After Convolutional Layers: Apply dropout after the activation function and possibly after the pooling layer.
    - Before Fully Connected Layers: Apply dropout before the fully connected layers to regularize the network and prevent overfitting.
    - Number of Dropout Layers: Generally, you can use dropout after each convolutional block and before each fully connected layer.

## During Evaluation
    Dropout should not be applied during evaluation as it randomly disables neurons. The model should be evaluated in a deterministic state.
![image.png](attachment:f7862c4a-db24-4cfb-903d-6c7e5bf829f3.png)

In [None]:
class CNN_BN_DO(nn.Module):
    def __init__(self, dropout_rate=0.10):
        super(CNN_BN_DO, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2)
        self.dropout_conv = nn.Dropout(dropout_rate)
        self.fc1 = nn.Linear(128 * 4 * 4, 128)
        self.dropout_fc = nn.Dropout(dropout_rate/2)
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.bn1(self.conv1(x))))
        x = self.dropout_conv(x)  # Apply dropout after conv1+pool
        x = self.pool(F.relu(self.bn2(self.conv2(x))))
        x = self.dropout_conv(x)  # Apply dropout after conv2+pool
        x = self.pool(F.relu(self.bn3(self.conv3(x))))
        x = self.dropout_conv(x)  # Apply dropout after conv3+pool
        
        x = x.view(-1, 128 * 4 * 4)
        x = self.dropout_fc(x)    # Apply dropout before fc1
        x = F.relu(self.fc1(x))
        x = self.dropout_fc(x)    # Apply dropout before fc2
        x = self.fc2(x)
        return x

In [None]:
# Instantiate your model
model3 = CNN_BN_DO().to(device)

# Train the model
cnn_bn_do_loss, trained_model3 = train_model(model3, train_loader, learning_rate=learning_rate,num_epochs=num_epochs, device=device)
 
# Test the model
test_model(trained_model3, test_loader, device=device)


In [None]:
plot_losses([cnn_loss,cnn_bn_loss,cnn_bn_do_loss], ['CNN','CNN With BN','CNN With BN & Dropout'])

# VGGNet - (Visual Geometry Group) from University of Oxford
    
    The original VGG network, also known as VGG16, was introduced in the paper "Very Deep Convolutional Networks for Large-Scale Image Recognition" by Karen Simonyan and Andrew Zisserman in 2014. 
    
    VGG16 refers to a convolutional neural network with 16 weight layers, including 13 convolutional layers and 3 fully connected layers.


![image.png](attachment:9873b94b-d1a2-4275-83b5-c4841b3f5da8.png)

In [None]:

class VGG16(nn.Module):
    def __init__(self):
        super().__init__()
        
        self.features = nn.Sequential(
            # First block  ----------------------------------
            nn.Conv2d(3, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Second block ----------------------------------
            
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(128, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Third block  ----------------------------------
            
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(256, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Fourth block  ----------------------------------
            
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Fifth block  ----------------------------------
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
        
        self.classifier = nn.Sequential(
            nn.Linear(512 * 1 * 1, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(4096, 10),
        )
        
    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

In [None]:
# # Instantiate your model
# VGG_16 = VGG16().to(device)

# # Train the model
# vgg_loss, trained_VGG_16 = train_model(VGG_16, train_loader, learning_rate=learning_rate,num_epochs=num_epochs, device=device)
 
# # Test the model
# test_model(trained_VGG_16, test_loader, device=device)


# Pre trained Models for Image Classification
https://pytorch.org/vision/stable/models.html
> ![image.png](attachment:0a6ba710-6a97-4831-90cc-f6385c57c30c.png)

In [None]:
from torchvision import models

In [None]:
dir(models)[:20]

# AlexNet 2012
    AlexNet solves the problem of image classification with subset of ImageNet dataset with roughly 1.2 million training images, 50,000 validation images, and 150,000 testing images. The input is an image of one of 1000 different classes and output is a vector of 1000 numbers.

    The input to AlexNet is an RGB image of size 256*256
    
![image.png](attachment:d8f4f779-9346-4cae-8e81-4b57786ac508.png) 

In [None]:
# Instantiate your model
alexnet = models.alexnet(pretrained=True)
print(alexnet)


In [None]:
# Load the image
from PIL import Image
image = Image.open('nature.jpg')
image

In [None]:

# Define the transformations
transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])

# Apply the transformations to the image
transformed_image = transform(image)

# Add a batch dimension
input_image = transformed_image.unsqueeze(0)
 
alexnet.eval()

# Make predictions
with torch.no_grad():
    outputs = alexnet(input_image)

# Get the predicted class
_, predicted_class = outputs.max(1)

# Print the predicted class
print(f'Predicted class: {predicted_class.item()}')

# Load ImageNet class labels
imagenet_labels = []
with open('imagenet_1000_labels.txt') as f:
    imagenet_labels = [line.strip() for line in f.readlines()]

# Print the predicted class label
print(f'Predicted class label: {imagenet_labels[predicted_class.item()]}')


# ResNet(2015)
    The ResNet model is one of the popular and most successful deep learning models so far.

    The commonly used ResNet architectures include : ResNet18, ResNet-34, ResNet-50, ResNet-101, and ResNet-152
    
## What problems ResNets solve?
    One of the problems ResNets solve is the famous known vanishing gradient. This is because when the network is too deep, the gradients from where the loss function is calculated easily shrink to zero after several applications of the chain rule. This result on the weights never updating its values and therefore, no learning is being performed.

## Residual Block
    The intuition behind a network with residual blocks is that each layer is fed to the next layer of the network and also directly to the next layers skipping between a few layers in between. Residual blocks allow you to train much deeper neural networks.  

> ![image.png](attachment:8e1fe62a-75a4-44fe-a4fd-6421922595c7.png) 


![image.png](attachment:f98a0d53-9732-44e0-b7d5-858fe8944fd5.png) 
### ResNet-18 consists of the following blocks:

    - Conv1: Initial convolutional layer followed by BatchNorm, ReLU, and MaxPool.
    - Conv2_x: Two residual blocks, each with two convolutional layers.
    - Conv3_x: Two residual blocks, each with two convolutional layers.
    - Conv4_x: Two residual blocks, each with two convolutional layers.
    - Conv5_x: Two residual blocks, each with two convolutional layers.
    - Final Block: Average pooling, fully connected layer, and output layer.

In [None]:
#  ResNet-18

class ResNet18(nn.Module):
    def __init__(self, n_classes):
        super(ResNet18, self).__init__()
        
        self.dropout_percentage = 0.5
        self.relu = nn.ReLU()
        
        # BLOCK-1 (starting block) input=(224x224) output=(56x56)
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=(7,7), stride=(2,2), padding=(3,3))
        self.batchnorm1 = nn.BatchNorm2d(64)
        self.maxpool1 = nn.MaxPool2d(kernel_size=(3,3), stride=(2,2), padding=(1,1))
        
        # BLOCK-2 (1) input=(56x56) output = (56x56)
        self.conv2_1_1 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm2_1_1 = nn.BatchNorm2d(64)
        self.conv2_1_2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm2_1_2 = nn.BatchNorm2d(64)
        self.dropout2_1 = nn.Dropout(p=self.dropout_percentage)
        # BLOCK-2 (2)
        self.conv2_2_1 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm2_2_1 = nn.BatchNorm2d(64)
        self.conv2_2_2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm2_2_2 = nn.BatchNorm2d(64)
        self.dropout2_2 = nn.Dropout(p=self.dropout_percentage)
        
        # BLOCK-3 (1) input=(56x56) output = (28x28)
        self.conv3_1_1 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(3,3), stride=(2,2), padding=(1,1))
        self.batchnorm3_1_1 = nn.BatchNorm2d(128)
        self.conv3_1_2 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm3_1_2 = nn.BatchNorm2d(128)
        self.concat_adjust_3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=(1,1), stride=(2,2), padding=(0,0))
        self.dropout3_1 = nn.Dropout(p=self.dropout_percentage)
        # BLOCK-3 (2)
        self.conv3_2_1 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm3_2_1 = nn.BatchNorm2d(128)
        self.conv3_2_2 = nn.Conv2d(in_channels=128, out_channels=128, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm3_2_2 = nn.BatchNorm2d(128)
        self.dropout3_2 = nn.Dropout(p=self.dropout_percentage)
        
        # BLOCK-4 (1) input=(28x28) output = (14x14)
        self.conv4_1_1 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=(3,3), stride=(2,2), padding=(1,1))
        self.batchnorm4_1_1 = nn.BatchNorm2d(256)
        self.conv4_1_2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm4_1_2 = nn.BatchNorm2d(256)
        self.concat_adjust_4 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=(1,1), stride=(2,2), padding=(0,0))
        self.dropout4_1 = nn.Dropout(p=self.dropout_percentage)
        # BLOCK-4 (2)
        self.conv4_2_1 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm4_2_1 = nn.BatchNorm2d(256)
        self.conv4_2_2 = nn.Conv2d(in_channels=256, out_channels=256, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm4_2_2 = nn.BatchNorm2d(256)
        self.dropout4_2 = nn.Dropout(p=self.dropout_percentage)
        
        # BLOCK-5 (1) input=(14x14) output = (7x7)
        self.conv5_1_1 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=(3,3), stride=(2,2), padding=(1,1))
        self.batchnorm5_1_1 = nn.BatchNorm2d(512)
        self.conv5_1_2 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm5_1_2 = nn.BatchNorm2d(512)
        self.concat_adjust_5 = nn.Conv2d(in_channels=256, out_channels=512, kernel_size=(1,1), stride=(2,2), padding=(0,0))
        self.dropout5_1 = nn.Dropout(p=self.dropout_percentage)
        # BLOCK-5 (2)
        self.conv5_2_1 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm5_2_1 = nn.BatchNorm2d(512)
        self.conv5_2_2 = nn.Conv2d(in_channels=512, out_channels=512, kernel_size=(3,3), stride=(1,1), padding=(1,1))
        self.batchnorm5_2_2 = nn.BatchNorm2d(512)
        self.dropout5_2 = nn.Dropout(p=self.dropout_percentage)
        
        # Final Block input=(7x7) 
        self.avgpool = nn.AvgPool2d(kernel_size=(7,7), stride=(1,1))
        self.fc = nn.Linear(in_features=1*1*512, out_features=1000)
        self.out = nn.Linear(in_features=1000, out_features=n_classes)
        # END
    
        def forward(self, x):
            #-----------------------------block 1---------------------------------
            # block 1 - 
                
            # Apply the first convolution, batch normalization, and ReLU activation
            x = self.relu(self.batchnorm1(self.conv1(x)))
            # Apply max pooling to reduce the spatial dimensions
            block1 = self.maxpool1(x) # 64
            #-----------------------------block 2---------------------------------
            # block 2 - 1
            
            # Apply the first convolution of block 2, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm2_1_1(self.conv2_1_1(block1)))
            # Apply the second convolution of block 2, followed by batch normalization
            x = self.batchnorm2_1_2(self.conv2_1_2(x))
            # Apply dropout
            x = self.dropout2_1(x) # 64
            
            # Skip connection: no adjustment needed as dimensions match
            block2_1 = self.relu(x + block1)
            
            # block 2 - 2
            
            # Apply the first convolution of the second part of block 2, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm2_2_1(self.conv2_2_1(block2_1)))
            # Apply the second convolution of the second part of block 2, followed by batch normalization
            x = self.batchnorm2_2_2(self.conv2_2_2(x))
            # Apply dropout
            x = self.dropout2_2(x)
            
            # skip connection
            block2 = self.relu(x + block2_1) # 64
            
            #-------------------------------block 3-------------------------------
            # block 3 - 1 
            
            # Apply the first convolution of block 3, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm3_1_1(self.conv3_1_1(block2)))
            # Apply the second convolution of block 3, followed by batch normalization
            x = self.batchnorm3_1_2(self.conv3_1_2(x))
            # Apply dropout
            x = self.dropout3_1(x) # 128
            
            # Adjust dimensions of the skip connection input to match the block's output
            block2 = self.concat_adjust_3(block2) # 64 --> 128
            
            #  skip connection 
            block3_1 = self.relu(x + block2) #128
            
            # block 3 - 2 (Identity block)
            # Apply the first convolution of the second part of block 3, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm3_2_1(self.conv3_2_1(op3_1)))
            # Apply the second convolution of the second part of block 3, followed by batch normalization
            x = self.batchnorm3_2_2(self.conv3_2_2(x))
            # Apply dropout
            x = self.dropout3_2(x)
            
            # skip connection
            block3 = self.relu(x + block3_1) # 128
            
            #--------------------------------block 4-------------------------------
            # block 4 - 1 
            
            # Apply the first convolution of block 4, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm4_1_1(self.conv4_1_1(block3)))
            # Apply the second convolution of block 4, followed by batch normalization
            x = self.batchnorm4_1_2(self.conv4_1_2(x))
            # Apply dropout
            x = self.dropout4_1(x) # 256
            
            # Adjust dimensions of the skip connection input to match the block's output
            block3 = self.concat_adjust_4(block3) # 128 --> 256
            # skip connection
            block4_1 = self.relu(x + block3) # 256
        
            # block 4 - 2 (Identity block)
            # Apply the first convolution of the second part of block 4, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm4_2_1(self.conv4_2_1(block4_1)))
            # Apply the second convolution of the second part of block 4, followed by batch normalization
            x = self.batchnorm4_2_2(self.conv4_2_2(x))
            # Apply dropout
            x = self.dropout4_2(x)
            
            # skip connection
            block4 = self.relu(x + block4_1) # 256
            
            #------------------------------- block 5 -----------------------------
            # block 5 - 1 
            
            # Apply the first convolution of block 5, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm5_1_1(self.conv5_1_1(block4)))
            # Apply the second convolution of block 5, followed by batch normalization
            x = self.batchnorm5_1_2(self.conv5_1_2(x))
            # Apply dropout
            x = self.dropout5_1(x) # 512
            
            # Adjust dimensions of the skip connection input to match the block's output
            block4 = self.concat_adjust_5(block4) # 256 --> 512
            
             # skip connection
            block5_1 = self.relu(x + block4) # 512
        
            # block 5 - 2 (Identity block)
            # Apply the first convolution of the second part of block 5, followed by batch normalization and ReLU activation
            x = self.relu(self.batchnorm5_2_1(self.conv5_2_1(block5_1)))
            # Apply the second convolution of the second part of block 5, followed by batch normalization
            x = self.batchnorm5_2_1(self.conv5_2_1(x))
            # Apply dropout
            x = self.dropout5_2(x)
            # skip connection
            block5 = self.relu(x + block5_1) # 512
            
            #-----------------------FINAL BLOCK - Classifier-------------------------
            
            # Apply average pooling to reduce the spatial dimensions to 1x1
            x = self.avgpool(block5)
            
            # Reshape the tensor to fit the fully connected layer
            x = x.reshape(x.shape[0], -1)
            
            # Apply the fully connected layer followed by ReLU activation
            x = self.relu(self.fc(x)) # 512 --> 1000
            # Apply the final output layer to get the class scores
            x = self.out(x) # 1000
            
            return x


In [None]:
# Instantiate your model
from torchvision import datasets, models
resnet18 = models.resnet18(pretrained=True)
print(resnet18)

In [None]:
# Load the image
image = Image.open('nature.jpg')

# Define the transformations
transform = transforms.Compose([
    transforms.Resize(224),  # ResNet18 expects 224x224 images
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])


# Apply the transformations to the image
transformed_image = transform(image)

# Add a batch dimension
input_image = transformed_image.unsqueeze(0)
 
resnet18.eval()

# Make predictions
with torch.no_grad():
    outputs = resnet18(input_image)

# Get the predicted class
_, predicted_class = outputs.max(1)

# Print the predicted class
print(f'Predicted class: {predicted_class.item()}')

# Load ImageNet class labels
imagenet_labels = []
with open('imagenet_1000_labels.txt') as f:
    imagenet_labels = [line.strip() for line in f.readlines()]

# Print the predicted class label
print(f'Predicted class label: {imagenet_labels[predicted_class.item()]}')
