# Architecture Review: Cifar10 with PyTorch
![image](output.png)
## Introduction
This notebook is a review of the architecture used in the PyTorch tutorial for the CIFAR10 dataset. This is the complete code for the tutorial, with some minor modifications to make it more readable and to modify the architectures for this dataset. To find the practice notebook, please visit the following link: [Practice Notebook](). 

### Pre-requisites
This notebook assumes that you have a basic understanding of neural networks and PyTorch. If you are new to PyTorch, please visit the following link to get started: [PyTorch Tutorials](https://pytorch.org/tutorials/). Please complete the [MNIST tutorial](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html) before starting this tutorial.

### Dataset
The CIFAR10 dataset consists of 60,000 32x32 color images in 10 classes, with 6,000 images per class. There are 50,000 training images and 10,000 test images. The classes are completely mutually exclusive. There is no overlap between automobiles and trucks. "Automobile" includes sedans, SUVs, and other similar vehicles. "Truck" includes only big trucks. The test batch contains 1,000 randomly selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5,000 images from each class. While you may realize that this doesn't sound like it's useful for real-world applications, it's a good dataset to start with and learn the basics of deep learning since it's small and easy to work with. 

**Classes**:
- airplane
- automobile
- bird
- cat
- deer
- dog
- frog
- horse
- ship
- truck
- [CIFAR10 Dataset](https://www.cs.toronto.edu/~kriz/cifar.html)
- [CIFAR10 Dataset PyTorch](https://pytorch.org/vision/stable/datasets.html#cifar)

**Critcism**:
The CIFAR10 dataset is a very small dataset and is not representative of real-world data. It is used for educational purposes and to learn the basics of deep learning. Intersting read: [Once Upon a Time in CIFAR-10](https://franky07724-57962.medium.com/once-upon-a-time-in-cifar-10-c26bb056b4ce#:~:text=However%2C%20the%20quality%20of%20CIFAR,10%20contains%200.54%25%20label%20errors.)


**Note**: 
There is also a CIFAR100 dataset, which has 100 classes. For the extra challenge, you can try working with the CIFAR100 dataset.

---

## Code setup
Let's start by importing the necessary libraries and setting up the code for the tutorial.

In [None]:
import torch
import torch.nn as nn
import torchvision as tv
import tqdm as tqdm
import matplotlib.pyplot as plt
from typing import Tuple, List, Union


We should also set the seed for the random number generator to ensure reproducibility. This is important when working with neural networks since the weights are initialized randomly, and the order of the data can affect the training process. By setting the seed, we ensure that the random number generator produces the same sequence of numbers every time we run the code.

We should also set the device to use the 'cuda' (NVIDIA GPU), 'mps' (Mac GPU), or 'cpu', depending on the availability of the GPU. If you are using Google Colab, you can change the runtime to use a GPU by going to 'Runtime' -> 'Change runtime type' -> 'Hardware accelerator' -> 'GPU'.

In [None]:
torch.manual_seed(0)

if torch.backends.cudnn.is_available():
    device = torch.device('cuda')
elif torch.backends.mps.is_available():
    device = torch.device('mps')
else:
    device = torch.device('cpu')


Then we need to load the CIFAR10 dataset using the `torchvision` library. The dataset is divided into training and testing sets, and each set contains images and their corresponding labels (0-9). The images are also normalized to have a mean of 0.5 and a standard deviation of 0.5. This normalization helps the neural network since it allows the weights to be updated more evenly during training.

The batch size is set to 64, though you can change it to a different value if you like. The batch size determines how many images are processed at once during training. A larger batch size can speed up training but requires more memory. A smaller batch size can slow down training but may lead to better generalization.

Finally, we create a DataLoader for the training and testing sets. The DataLoader is used to load the data in batches during training and testing. It also shuffles the data during training to ensure that the model learns from different samples in each epoch.

In [None]:
transforms = tv.transforms.Compose([
    tv.transforms.ToTensor(),
    tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # mean, std
])

train_dataset = tv.datasets.CIFAR10("./data", train=True, transform=transforms, download=True)
test_dataset = tv.datasets.CIFAR10("./data", train=False, transform=transforms, download=True)

train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=64, shuffle=False)


We also deifine an `evaluate_model` function to evaluate the model on a given dataset. This function calculates the accuracy of the model on the dataset and returns the accuracy.

Then we define a `train_model` function to train the model on the training dataset. This function takes the model, optimizer, loss function, number of epochs, and the training DataLoader as input. It trains the model for the specified number of epochs and returns the trained model.

And a `plot` function to plot the training and training and testing metrics.

In [None]:
def evaluate_model(model: nn.Module, loader: torch.utils.data.DataLoader, criterion: nn.Module) -> Tuple[float, float]:
    """ Evaluate the model on the given dataset and return both average accuracy and average loss. """
    model.eval()
    correct = 0
    total = 0
    total_loss = 0.0

    with torch.no_grad():
        for x, y in tqdm.tqdm(
            loader, leave=False, desc="Evaluating", total=len(loader)
        ):
            x = x.to(device)
            y = y.to(device)
            y_pred = model(x)
            loss = criterion(y_pred, y)
            total_loss += loss.item() * x.size(0)  # Accumulate loss scaled by batch size
            correct += (y_pred.argmax(1) == y).sum().item()
            total += y.size(0)

    average_loss = total_loss / total  # Calculate the average loss
    accuracy = correct / total
    return accuracy, average_loss


def train_model(
    model: nn.Module,
    train_loader: torch.utils.data.DataLoader,
    test_loader: torch.utils.data.DataLoader,
    epochs: int,
    lr: float,
    momentum: float,
) -> Tuple[Tuple[List[float], List[float]], Tuple[List[float], List[float]]]:
    """ Train the model on the given dataset and track accuracies and losses. """
    train_accuracies = []
    test_accuracies = []
    train_losses = []
    test_losses = []
    criterion = torch.nn.CrossEntropyLoss()
    optimizer = torch.optim.SGD(model.parameters(), lr=lr, momentum=momentum)

    with tqdm.tqdm(range(epochs), desc="Training", unit="epoch") as epochs_bar:
        for epoch in epochs_bar:
            model.train()
            train_epoch_losses = []
            for x, y in tqdm.tqdm(train_loader, leave=False, desc="Epoch", total=len(train_loader)):
                x, y = x.to(device), y.to(device)
                y_pred = model(x)
                loss = criterion(y_pred, y)
                train_epoch_losses.append(loss.item())

                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

            # Compute average loss for the epoch
            avg_train_loss = sum(train_epoch_losses) / len(train_epoch_losses)
            train_losses.append(avg_train_loss)

            # Evaluate model for both training and testing sets
            train_accuracy, train_loss = evaluate_model(model, train_loader, criterion)
            test_accuracy, test_loss = evaluate_model(model, test_loader, criterion)

            train_accuracies.append(train_accuracy)
            test_accuracies.append(test_accuracy)
            test_losses.append(test_loss)

            epochs_bar.set_postfix(
                train_loss=train_loss,
                test_loss=test_loss,
                train_accuracy=train_accuracy,
                test_accuracy=test_accuracy
            )

    return (train_accuracies, test_accuracies), (train_losses, test_losses)


def plot(accuracies: Tuple[List[float], List[float]], losses: Tuple[List[float], List[float]]) -> None:
    """Plot the training progress of the model."""
    plt.figure(figsize=(20, 5))
    # Plot Accuracy
    plt.subplot(1, 2, 1)
    plt.plot(accuracies[0], label="Training Accuracy", color='b', linewidth=2)
    plt.plot(accuracies[1], label="Test Accuracy", color='r', linewidth=2)
    
    # Compute and plot EMA for training accuracies
    ema_train = [accuracies[0][0]]  # Start EMA from the first actual accuracy
    alpha = 0.2  # Smoothing factor for EMA
    for loss in accuracies[0][1:]:
        ema = alpha * loss + (1 - alpha) * ema_train[-1]
        ema_train.append(ema)
    plt.plot(ema_train, label="EMA Training Accuracy", linestyle="--", color='darkblue', alpha=0.75)
    
    # Compute and plot EMA for test accuracies
    ema_test = [accuracies[1][0]]  # Start EMA from the first actual accuracy
    for loss in accuracies[1][1:]:
        ema = alpha * loss + (1 - alpha) * ema_test[-1]
        ema_test.append(ema)
    plt.plot(ema_test, label="EMA Test Accuracy", linestyle="--", color='darkred', alpha=0.75)
    
    plt.title("Accuracy Progress", fontsize=16)
    plt.xlabel("Epoch", fontsize=14)
    plt.ylabel("Accuracy", fontsize=14)
    plt.legend()
    plt.grid(True, linestyle='--', linewidth=0.5, color='gray', alpha=0.5)
    
    # Plot Loss
    plt.subplot(1, 2, 2)
    plt.plot(losses[0], label="Training Loss", color='b', linewidth=2)
    plt.plot(losses[1], label="Test Loss", color='r', linewidth=2)
    
    # Compute and plot EMA for training accuracies
    ema_train = [losses[0][0]]  # Start EMA from the first actual accuracy
    alpha = 0.5  # Smoothing factor for EMA
    for loss in losses[0][1:]:
        ema = alpha * loss + (1 - alpha) * ema_train[-1]
        ema_train.append(ema)
    plt.plot(ema_train, label="EMA Training Loss", linestyle="--", color='darkblue', alpha=0.75)
    
    # Compute and plot EMA for test accuracies
    ema_test = [losses[1][0]]  # Start EMA from the first actual accuracy
    for loss in losses[1][1:]:
        ema = alpha * loss + (1 - alpha) * ema_test[-1]
        ema_test.append(ema)
    plt.plot(ema_test, label="EMA Test Loss", linestyle="--", color='darkred', alpha=0.75)
    
    plt.title("Loss Progress", fontsize=16)
    plt.xlabel("Epoch", fontsize=14)
    plt.ylabel("Loss", fontsize=14)
    plt.legend()
    plt.grid(True, linestyle='--', linewidth=0.5, color='gray', alpha=0.5)

    plt.tight_layout()
    plt.show()


### Linear Model

#### Single Layer Perceptron

In [None]:
# Set random seed for reproducibility
torch.manual_seed(0)

# Define a simple model
model = nn.Sequential(
    nn.Flatten(),
    nn.Linear(32*32*3, 10)
).to(device)

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")


#### Multi Layer Perceptron

In [None]:
# Set random seed for reproducibility
torch.manual_seed(0)

# Define a simple model
model = torch.nn.Sequential(
    torch.nn.Flatten(),
    torch.nn.Linear(32*32*3, 512),
    torch.nn.ReLU(),
    torch.nn.Linear(512, 256),
    torch.nn.ReLU(),
    torch.nn.Linear(256, 10)
).to(device)

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")


### LeNet-5

In [None]:
torch.manual_seed(0)

model = nn.Sequential(
    nn.Conv2d(3, 6, kernel_size=5),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(6, 16, kernel_size=5),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.Linear(16 * 5 * 5, 120),
    nn.ReLU(),
    nn.Linear(120, 84),
    nn.ReLU(),
    nn.Linear(84, 10),
).to(device)

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")


### More Modern CNNs and better Training Techniques

In [None]:
torch.manual_seed(0)

model = nn.Sequential(
    nn.Conv2d(3, 32, kernel_size=5),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Conv2d(32, 64, kernel_size=5),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.Linear(64*5*5, 120),
    nn.ReLU(),
    nn.Linear(120, 84),
    nn.ReLU(),
    nn.Linear(84, 10)
).to(device)

metrics = train_model(model, train_loader, test_loader, epochs=100, lr=0.01, momentum=0.9)
print(f"Final training accuracy: {train_metrics[-5:]}")
print(f"Final test accuracy: {test_metrics[-5:]}")

plot(*metrics)


#### Batch Normalization

In [None]:
torch.manual_seed(0)

model = nn.Sequential(
    nn.BatchNorm2d(3),
    nn.Conv2d(3, 32, kernel_size=3, bias=False),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.BatchNorm2d(32),
    nn.Conv2d(32, 64, kernel_size=3, bias=False),
    nn.ReLU(),
    nn.MaxPool2d(2),
    nn.Flatten(),
    nn.BatchNorm1d(64*6*6),
    nn.Linear(64*6*6, 120),
    nn.ReLU(),
    nn.BatchNorm1d(120),
    nn.Linear(120, 84),
    nn.ReLU(),
    nn.BatchNorm1d(84),
    nn.Linear(84, 10)
).to(device)

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")


### VGG-16

In [None]:
torch.manual_seed(0)

class VGG(nn.Module):
    def __init__(self):
        super(VGG, self).__init__()
        self.features = self._make_layers([64, 64, 'M', 128, 128, 'M', 256, 256, 256, 'M', 512, 512, 512, 'M', 512, 512, 512, 'M'])
        self.classifier = nn.Linear(512, 10)

    def forward(self, x):
        out = self.features(x)
        out = out.view(out.size(0), -1)
        out = self.classifier(out)
        return out

    def _make_layers(self, cfg):
        layers = []
        in_channels = 3
        for x in cfg:
            if x == 'M':
                layers += [nn.MaxPool2d(kernel_size=2, stride=2)]
            else:
                layers += [nn.Conv2d(in_channels, x, kernel_size=3, padding=1),
                           nn.BatchNorm2d(x),
                           nn.ReLU(inplace=True)]
                in_channels = x
        layers += [nn.AvgPool2d(kernel_size=1, stride=1)]
        return nn.Sequential(*layers)

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")


### ResNet-50

In [None]:
torch.manual_seed(0)

class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, in_channels, out_channels, stride=1, downsample=None):
        super(Bottleneck, self).__init__()
        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, bias=False)
        self.bn1 = nn.BatchNorm2d(out_channels)
        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channels)
        self.conv3 = nn.Conv2d(out_channels, out_channels * self.expansion, kernel_size=1, bias=False)
        self.bn3 = nn.BatchNorm2d(out_channels * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out

class ResNet(nn.Module):
    def __init__(self, block, layers, num_classes=10):
        super(ResNet, self).__init__()
        self.in_channels = 64
        self.conv1 = nn.Conv2d(3, self.in_channels, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channels)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
        self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

    def _make_layer(self, block, out_channels, blocks, stride=1):
        downsample = None
        if stride != 1 or self.in_channels != out_channels * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(out_channels * block.expansion),
            )

        layers = []
        layers.append(block(self.in_channels, out_channels, stride, downsample))
        self.in_channels = out_channels * block.expansion
        for _ in range(1, blocks):
            layers.append(block(self.in_channels, out_channels))

        return nn.Sequential(*layers)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        x = self.avgpool(x)
        x = torch.flatten(x, 1)
        x = self.fc(x)
        return x

# Train the model 
metrics = train_model(model, train_loader, test_loader, epochs=10, lr=0.01, momentum=0.9)

# Plot the training progress
plot(*metrics)

# Print final accuracies
print(f"Final training accuracies: {metrics[0][0][-5:]}")
print(f"Final test accuracies: {metrics[0][1][-5:]}")
