# Data Pipeline

We don't have to build the model around the entire classes. We can select a small subset of classes to start with and then slowly scale after checking performance. 

# CNN: Key Layers

## Convolutional Layer (nn.Conv2d)

This is the core building block of a CNN, using learnable filters to scan the image for visual features. The output is a set of "feature maps" that highlight where in the image these patterns appear.

- in_channels: The number of channels from the previous layer; for the first layer, this is 3 for the RGB color channels.
- out_channels: The number of filters the layer will learn, determining the number of output feature maps.
- kernel_size: The dimensions of the filter, such as a 3x3 grid that examines a pixel and its immediate neighbors.
- padding: Adds a border around the image, allowing the kernel to process edge pixels while preserving the image's dimensions.

## ReLU Activation Function (nn.ReLU)

An activation function that introduces non-linearity by changing all negative values in the feature maps to zero. This helps the model learn more complex patterns.

## Max Pooling Layer (nn.MaxPool2d)

This layer downsamples the feature maps by reducing their height and width, which makes the network more efficient. It slides a window over the feature map and keeps only the single largest value from that window, discarding the rest.

- kernel_size: The size of the window to perform pooling on, such as a 2x2 area.
- stride: The step size the window moves across the image. A stride of 2 with a 2x2 kernel will halve the feature map's dimensions.

## Flatten Layer (nn.Flatten)

A utility layer that unrolls the 2D feature maps into a single 1D vector. This is a necessary step to prepare the data for the fully connected linear layers.

## Linear Layer (nn.Linear)

Also known as a fully connected layer, it performs the final classification. It combines the features learned by the convolutional layers into a final prediction.

## Dropout Layer (nn.Dropout)

A regularization technique that helps prevent overfitting by randomly setting a fraction of neuron activations to zero during training. This forces the network to learn more robust features instead of relying too heavily on any single pattern.

In [None]:
import torch.nn as nn

class SimpleCNN(nn.Module):

    def __init__(self, num_of_output_classes: int):
        
        super(SimpleCNN, self).__init__()

        self.conv1 = nn.Conv2d(in_channels=3, out_channels=32, kernel_size=3, padding=1)
        self.relu1 = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1)
        self.relu2 = nn.ReLU()
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.conv3 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=3, padding=1)
        self.relu3 = nn.ReLU()
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)

        self.flatten = nn.Flatten()

        self.fc1 = nn.Linear(128 * 4 * 4, 512)
        self.relu4 = nn.ReLU()
        self.dropout = nn.Dropout(0.5)
        self.fc2 = nn.Linear(512, num_of_output_classes)

    def forward(self, x):

        x = self.conv1(x)
        x = self.relu1(x)
        x = self.pool1(x)

        x = self.conv2(x)
        x = self.relu2(x)
        x = self.pool2(x)

        x = self.conv3(x)
        x = self.relu3(x)
        x = self.pool3(x)

        x = self.flatten(x)

        x = self.fc1(x)
        x = self.relu4(x)
        x = self.dropout(x)
        x = self.fc2(x)

        return x


# Initialize Loss Function and Optimizer

We'll use **nn.CrossEntropyLoss**. This is the standard loss function for multi-class classification tasks as it's designed to measure the error when a model has to choose one class from several possibilities.

We'll use the **Adam** optimizer. This is a popular and efficient algorithm that updates the model's weights to minimize the loss.

In [None]:
import torch.optim as optim

loss_function = nn.CrossEntropyLoss()

# prototype_model is the object of class SimpleCNN we create.
optimizer_prototype = optim.Adam(prototype_model.parameters(), lr=0.001)

# Dynamic Graphs

Earlier Deep learning frameworks needed to have the computational graph, which are the compute steps taken by a model for arriving at a result, structured and defined beforehand. Meaning debugging, branching etc. was too difficult due to this rigid structure. Much like nn.Sequential() where the computation assumes a fixed path.

But PyTorch gives us flexibility in design as it allows dynamic building on computational graphs. Graphs are built during the forward pass and persists during backward pass. But when it is time to do forward pass again, the old graph is discarded and a new one is created. Thus giving us much flexibility to work with.

# Modular Architectures

The above CNN code appears a bit redundant. Too many repititions. But this is actually a good thing as writing it all down gives us an idea of how the overall flow looks like. Then you could identify patterns and refactor your code.

In [None]:
class ConvLayer(nn.Module):

    def __init__(self, channel_in, channel_out):
        super().__init__()

        self.block = nn.Sequential(
            nn.Conv2d(in_channels=channel_in, out_channels=channel_out, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )

    def forward(self, x):
        return self.block(x)
    
class CNNModel(nn.Module):

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

        self.features = nn.Sequential(
            ConvLayer(3, 32),
            ConvLayer(32, 64),
            ConvLayer(64, 128)
        )

        self.classifier = nn.Sequential(
            nn.Linear(128 * 4 * 4, 512),
            nn.ReLU(),
            nn.Dropout(p=0.5),
            nn.Linear(512, 10)
        )

    def forward(self, x):
        x = self.features(x)
        return self.classifier(x)

# How do you see what's inside your model?

## How many parameters does your model have?

In [None]:
total_params = sum(param.numel() for param in model.parameters())

This just gives you a number, but if you need more details, use this:

In [None]:
for name, param in model.named_parameters():
    # print name and param.shape

For printing top-level layers like the custom ones we designed, we can also use model.named_children(). But this only gives us a top level view. If layers like sequential is defined within the layer, we won't be able to see that. 

For going deeper use model.named_modules()

In case of errors, we can inspect the shapes of each output in the forward pass.