# Tutorial 6: Convolutional Neural Networks (CNNs)

## Overview

Welcome to the Python Tutorial on Convolutional Neural Networks (CNNs)! In this comprehensive guide, we will dive into one of the most important and powerful deep learning architectures for image processing - CNNs. CNNs are designed to automatically and adaptively learn spatial hierarchies of features from input images.

CNNs have revolutionized computer vision tasks, such as image classification, object detection, segmentation, and more. In this tutorial, we will explore the fundamental concepts of CNNs, their architecture, and the mathematics behind their working. We will also walk through the implementation of CNNs in Python using popular deep learning libraries.

## Prerequisites

Before diving into this tutorial, it is recommended to have a solid understanding of the following topics:

- Python programming fundamentals
- Basics of machine learning and neural networks
- Linear algebra and calculus concepts - Understanding matrices, vectors, and derivatives will be beneficial for grasping CNNs.

Knowledge of libraries like NumPy, PyTorch, and Matplotlib will be helpful, as we will use them properly in our implementations and visualizations.

## What You'll Learn

By the end of this tutorial, you will:

- Understand the fundamental building blocks of Convolutional Neural Networks (CNNs).
- Comprehend the concept of convolution and its role in learning local patterns.
- Implement a basic CNN in Python using PyTorch, a popular deep learning framework.
- Train the CNN model on a dataset for image classification.

In [None]:
from torchvision import models, datasets, transforms
from torch.utils.data import DataLoader
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
import os

## CNN model definition

```python
# Define the model
class CNN(nn.Module):
    def __init__(self, classes=10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)

        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)

        self.pool = nn.MaxPool2d(2, 2)

        self.fc1 = nn.Linear(128 * 8 * 8, classes)
```

- The code defines a custom CNN class that inherits from `nn.Module`, which is the base class for all neural network modules in PyTorch.
- The `__init__` method is the constructor that initializes the layers of the CNN model.
- The CNN model has three convolutional layers, each followed by a batch normalization layer.
- The `nn.Conv2d` layers define the convolutional operations. The first layer takes input channels (3 for RGB images), has 32 output channels (32 filters), a kernel size of 3x3, a stride of 1, and padding of 1.
- The `nn.BatchNorm2d` layers perform batch normalization, which helps stabilize training by normalizing the inputs to each layer.
- The `nn.MaxPool2d` layer is used for max pooling with a kernel size of 2x2 and a stride of 2, effectively reducing the spatial dimensions by half.
- The final `nn.Linear` layer is the fully connected layer that maps the output of the convolutional layers to the number of classes in the dataset.

```python
    def forward(self, x):  # input: batch_size * 3 * 64 * 64
        batch_size = x.shape[0]

        x = self.conv1(x)  # batch_size * 32 * 64 * 64
        x = self.bn1(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 32 * 32 * 32

        x = self.conv2(x)  # batch_size * 64 * 32 * 32
        x = self.bn2(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 64 * 16 * 16

        x = self.conv3(x)  # batch_size * 128 * 16 * 16
        x = self.bn3(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 128 * 8 * 8

        x = x.view(batch_size, -1)  # batch_size * (128 * 8 * 8)
        x = self.fc1(x)  # batch_size * 10

        return x
```

- The `forward` method defines the forward pass of the CNN model, which takes the input tensor `x` (batch of images) as an argument.
- The input tensor has the shape `batch_size * 3 * 64 * 64`, where batch_size is the number of images in the batch, `3` is the number of channels (RGB), and `64 * 64` is the image size.
- The tensor `x` undergoes a series of operations: convolution, batch normalization, ReLU activation, and max pooling in each convolutional block.
- After the convolutional blocks, the tensor is flattened using `x.view(batch_size, -1)` to convert it into a 1D vector to be passed through the fully connected layer.
- Finally, the tensor `x` is passed through the fully connected layer `self.fc1` to obtain the final output tensor of shape `batch_size * classes` (where `classes` is the number of output classes).

In [24]:
# Define the model
class CNN(nn.Module):
    def __init__(self, classes = 10):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(32)
        
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(128)
        
        self.pool = nn.MaxPool2d(2,2)
        
        self.fc1 = nn.Linear(128 * 8 * 8, classes)
       

    def forward(self, x):  # input: batch_size * 3 * 64 * 64
        batch_size = x.shape[0]
        
        x = self.conv1(x) # batch_size * 32 * 64 * 64
        x = self.bn1(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 32 * 32 * 32
        
        x = self.conv2(x) # batch_size * 64 * 32 * 32
        x = self.bn2(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 64 * 16 * 16
        
        x = self.conv3(x) # batch_size * 128 * 16 * 16
        x = self.bn3(x)
        x = F.relu(x)
        x = self.pool(x)  # batch_size * 128 * 8 * 8
        
        x = x.view(batch_size, -1) # batch_size * (128 * 8 * 8)
        x = self.fc1(x)            # batch_size * 10
        
        return x

## Load Dataset

Now we define a function named `load_data` that is used to load and preprocess image data for a CNN model. It utilizes PyTorch's `datasets.ImageFolder` and `DataLoader` classes to handle data loading and batching.

```python
def load_data(data_dir="./data/", input_size=64, batch_size=36):
    # data augmentation
    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomResizedCrop(input_size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'test': transforms.Compose([
            transforms.Resize(input_size),
            transforms.CenterCrop(input_size),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }
```

- The `data_transforms` dictionary defines two different data transformations: one for training data (`'train'`) and another for test/validation data (`'test'`).
- For training data, the `transforms.RandomResizedCrop` randomly crops and resizes the input image to `input_size`. It also applies random horizontal flips using `transforms.RandomHorizontalFlip`.
- For test/validation data, the `transforms.Resize` resizes the input image to `input_size`, and `transforms.CenterCrop` performs center cropping to ensure the same size for all test images.
- In both cases, the `transforms.ToTensor` converts the images to PyTorch tensors, and `transforms.Normalize` standardizes the pixel values using mean `[0.485, 0.456, 0.406]` and standard deviation `[0.229, 0.224, 0.225]`. These normalization values are commonly used with pre-trained models.

Then we create train data and validation data:

```python
    image_dataset_train = datasets.ImageFolder(os.path.join(data_dir, '2-Medium-Scale'), data_transforms['train'])
    image_dataset_valid = datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms['test'])
```

- The `datasets.ImageFolder` class is used to create datasets from image folders. It expects the data to be organized in subfolders, with each subfolder representing a class label.
- `image_dataset_train` and `image_dataset_valid` are created using `datasets.ImageFolder`. They represent the training and validation datasets, respectively.
- The `os.path.join` function is used to construct the paths to the data directories.

After generating data, we also need to convert them into `Dataloader` class, which are ready to be used in training and validation loops of a CNN model.

```python
    train_loader = DataLoader(image_dataset_train, batch_size=batch_size, shuffle=True, num_workers=0)
    valid_loader = DataLoader(image_dataset_valid, batch_size=batch_size, shuffle=False, num_workers=0)

    return train_loader, valid_loader
```

- The `DataLoader` class is used to create data loaders that provide batches of data during training and validation.
- `train_loader` and `valid_loader` are created using DataLoader, and they represent the data loaders for the training and validation datasets, respectively.
- `batch_size` is specified as an argument to the data loaders, determining the number of samples in each batch.
- `shuffle=True` for the training data loader ensures that the data is shuffled at the beginning of each epoch, which is essential for stochastic gradient descent.
- `shuffle=False` for the validation data loader means the data will not be shuffled to maintain the order of the data for evaluation.
- `num_workers=0` specifies the number of subprocesses used for data loading. Setting it to 0 means the data will be loaded in the main process.

In [25]:
# Load dataset
## the mean and standard variance of imagenet dataset
## mean_vals = [0.485, 0.456, 0.406]
## std_vals = [0.229, 0.224, 0.225]

def load_data(data_dir = "./data/",input_size = 64,batch_size = 36):
    # data augmentation
    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomResizedCrop(input_size),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'test': transforms.Compose([
            transforms.Resize(input_size),
            transforms.CenterCrop(input_size),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }
    ## Load dataset
    ## For other tasks, you may need to modify the data dir or even rewrite some part of 'data.py'
    image_dataset_train = datasets.ImageFolder(os.path.join(data_dir, '2-Medium-Scale'), data_transforms['train'])
    image_dataset_valid = datasets.ImageFolder(os.path.join(data_dir, 'test'), data_transforms['test'])

    train_loader = DataLoader(image_dataset_train, batch_size=batch_size, shuffle=True, num_workers=0)
    valid_loader = DataLoader(image_dataset_valid, batch_size=batch_size, shuffle=False, num_workers=0)

    return train_loader, valid_loader

## Model Training Definition

The train_model function takes the following arguments:

- `model`: The neural network model to be trained.
- `train_loader`: The data loader for the training dataset.
- `valid_loader`: The data loader for the validation dataset.
- `criterion`: The loss function to measure the model's performance during training.
- `optimizer`: The optimization algorithm to update the model's parameters.
- `num_epochs`: The number of training epochs (default value is 20).

The principle can be shown as follows:

- During each training epoch, the model performs forward and backward passes on the training data and updates its parameters using an optimizer, the input data (inputs) and target labels (labels) are moved to the appropriate device (e.g., GPU) if available.
- The model's parameters gradients are zeroed (`optimizer.zero_grad()`), and a forward pass is performed to obtain predictions (`outputs`) on the input data.
- The loss is calculated by comparing the predictions with the true labels (`criterion(outputs, labels)`).
- The gradients are computed with respect to the loss (`loss.backward()`), and the model's parameters are updated using the optimizer (`optimizer.step()`).
- After each epoch, the function evaluates the model's performance on the validation dataset.
- The average loss and accuracy for both training and validation datasets are computed and displayed for each epoch.
- The function also saves the model with the best validation accuracy as `best_model.pt`.



In [26]:
## Note that: here we provide a basic solution for training and validation.
## You can directly change it if you find something wrong or not good enough.

def train_model(model,train_loader, valid_loader, criterion, optimizer, num_epochs=20):        
    best_acc = 0.0
    
    for epoch in range(num_epochs):
        # train the model
        model.train(True)
        total_loss = 0.0
        total_correct = 0
        for inputs, labels in train_loader:
            # send the data to device (GPU)
            inputs = inputs.to(device)
            labels = labels.to(device)
            optimizer.zero_grad()
            outputs = model(inputs) # prediction
            loss = criterion(outputs, labels) # loss
            _, predictions = torch.max(outputs, 1) # The class with maximal probability
            loss.backward()
            optimizer.step()

            total_loss += loss.item() * inputs.size(0)
            total_correct += torch.sum(predictions == labels.data)
        train_loss = total_loss / len(train_loader.dataset)
        train_acc = total_correct.double() / len(train_loader.dataset)
        
        # test
        model.train(False)
        total_loss = 0.0
        total_correct = 0
        for inputs, labels in valid_loader:
            inputs = inputs.to(device)
            labels = labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            _, predictions = torch.max(outputs, 1)
            total_loss += loss.item() * inputs.size(0)
            total_correct += torch.sum(predictions == labels.data)
        valid_loss = total_loss / len(valid_loader.dataset)
        valid_acc = total_correct.double() / len(valid_loader.dataset)
        
        # Show the results
        print('*' * 100)
        print('epoch:{:d}/{:d}'.format(epoch, num_epochs))
        print("training: loss:   {:.4f}, accuracy: {:.4f}".format(train_loss, train_acc))
        print("validation: loss: {:.4f}, accuracy: {:.4f}".format(valid_loss, valid_acc))
        
        # save the best model
        if valid_acc > best_acc:
            best_acc = valid_acc
            best_model = model
            torch.save(best_model, 'best_model.pt')

In [27]:
## about model
num_classes = 10

## about data
data_dir = "data" ## You may need to specify the data_dir first
inupt_size = 64
batch_size = 64

## about training
num_epochs = 50
lr = 0.001

## model initialization
model = CNN(classes = num_classes)
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print('device:', device)
model = model.to(device)

## optimizer
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=0.9)
## loss function
criterion = nn.CrossEntropyLoss()

## data preparation
train_loader, valid_loader = load_data(data_dir=data_dir,input_size=inupt_size, batch_size=batch_size)
# train
train_model(model,train_loader, valid_loader, criterion, optimizer, num_epochs=num_epochs)


device: cpu
****************************************************************************************************
epoch:0/50
training: loss:   6.4479, accuracy: 0.1480
validation: loss: 7.7156, accuracy: 0.2000
****************************************************************************************************
epoch:1/50
training: loss:   5.1180, accuracy: 0.2820
validation: loss: 10.2933, accuracy: 0.2700
****************************************************************************************************
epoch:2/50
training: loss:   3.6913, accuracy: 0.3160
validation: loss: 5.0701, accuracy: 0.2600
****************************************************************************************************
epoch:3/50
training: loss:   3.0899, accuracy: 0.3180
validation: loss: 3.8203, accuracy: 0.2800
****************************************************************************************************
epoch:4/50
training: loss:   2.6755, accuracy: 0.3840
validation: loss: 2.2697, accuracy: 0.38