# 05- Convolutional Neural Networks

Convolutional Neural Networks (CNNs) have revolutionized the field of computer vision. They are specifically designed to recognize visual patterns directly from pixel images with minimal preprocessing. CNNs are hierarchical models where neurons in one layer connect to neurons in the next layer in a limited fashion, somewhat like the receptive field in human vision.

CNNs are useful for finding patterns in images to recognize objects, classes, and categories.

The first few layers recognize simple visual features, like edges. Deeper layers use the initial ones to build more sophisticated recognition patterns.

![](../media/cnn/CNN_Activation-maximization.png "International Journal of Computer Vision")

In [None]:
import torch
from torchvision import transforms
import torch.optim as optim
import matplotlib.pyplot as plt
from collections import OrderedDict

In [None]:
device = 'cuda' if torch.cuda.is_available() else 'cpu'
print(f"Using {device} device")

### Hyperparameters

In [None]:
RATIO_VALIDATION = 0.2
BATCH_SIZE = 64
LEARNING_RATE = 0.001
EPOCHS = 10

In [None]:
# Load model tools
from scripts.model_tools import train_validate, test_validate, test_validate_confusion, set_fashion_dataset

In [None]:
# Define a transformation pipeline. 
transform = transforms.Compose([transforms.ToTensor()])
train_ds, test_ds, train_dl, val_dl, test_dl, classes = set_fashion_dataset(transform, RATIO_VALIDATION, BATCH_SIZE)

A typical CNN architecture consists of:

 1. Convolutional Layers: Apply convolution operation on the input layer to detect features.
 2. Activation Layers: Introduce non-linearity to the model (typically ReLU).
 3. Pooling Layers: Perform down-sampling operations to reduce dimensionality.
 4. Fully Connected Layers: After several convolutional and pooling layers, the high-level reasoning in the neural network happens via fully connected layers.

![](../media/intro/CNN.png "python.plainenglish.io")

Let's design a basic CNN for our dataset. Instead of defining the model as a sequential setup of layers with `nn.Sequential`, we will extend `nn.Module`.

In [None]:
import torch.nn as nn
import torch.nn.functional as F

class BasicCNN(nn.Module):
    def __init__(self):
        super(BasicCNN, self).__init__()
        # Input: [batch_size, 1, 28, 28]
        # Convolution 1 setup:  In-channels:1, Out_channels: 32, kernel_size: 3
        self.conv1 = nn.Conv2d(1, 32, 3)  #,  Output: [batch_size, 32, 26, 26]. Because of kernel size with no padding, output shape is smaller
        
        # Input: [batch_size, 32, 26, 26]
        # Convolution 2 setup:  In-channels:32, Out_channels: 64, kernel_size: 3
        self.conv2 = nn.Conv2d(32, 64, 3) # Output: [batch_size, 64, 11, 11]. Output was halved first, because of maxpool size 2
        
        self.fc1 = nn.Linear(64 * 5 * 5, 128)  # Flattening: [batch_size, 64*5*5]
        self.fc2 = nn.Linear(128, 10)

    def forward(self, x):
        # Input: [batch_size, 1, 28, 28]
        x = F.relu(self.conv1(x))
        # Shape: [batch_size, 32, 26, 26]
        x = F.max_pool2d(x, 2)
        # Shape: [batch_size, 32, 13, 13]
        
        x = F.relu(self.conv2(x))
        # Shape: [batch_size, 64, 11, 11]
        x = F.max_pool2d(x, 2)
        # Shape: [batch_size, 64, 5, 5]
        
        x = x.view(-1, 64 * 5 * 5) # Flattening
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)

In [None]:
model = BasicCNN().to(device)

In [None]:
optimizer = torch.optim.Adam(model.parameters(), lr=LEARNING_RATE)
criterion = nn.CrossEntropyLoss()

In [None]:
train_validate(model, criterion, optimizer, train_dl, val_dl, device, n_epochs=EPOCHS)

## Evaluation

Let's do a more sofisticated evaluation with a classification report and confusion matrix. We will gain more insights into our data, instead of just relying on a single accuracy metric.

In [None]:
# Import necessary libraries:
# numpy for numerical operations
# sklearn.metrics for evaluation metrics like classification report and confusion matrix
# seaborn and matplotlib for data visualization
import numpy as np
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Set the model to evaluation mode. This is important as certain layers like dropout behave differently during training and evaluation.
model.eval()

# Lists to store all predictions and true labels
all_preds = []
all_labels = []

# We don't want to compute gradients during evaluation, hence wrap the code inside torch.no_grad()
with torch.no_grad():
    batch_acc = []
    # Iterate over all batches in the test loader
    for images, labels in train_dl:
        # Transfer images and labels to the computational device (either CPU or GPU)
        images, labels = images.to(device), labels.to(device)
        
        # Pass the images through the model to get predictions
        outputs = model(images)
        
        # Get the class with the maximum probability as the predicted class
        _, predicted = torch.max(outputs, 1)
        
        # Extend the all_preds list with predictions from this batch
        all_preds.extend(predicted.cpu().numpy())
        
        # Extend the all_labels list with true labels from this batch
        all_labels.extend(labels.cpu().numpy())

        # Compare actual labels and predicted labels
        result = predicted == labels.view(predicted.shape)
        acc = torch.mean(result.type(torch.FloatTensor))
        batch_acc.append(acc.item())

# Print a classification report which provides an overview of the model's performance for each class
print(classification_report(all_labels, all_preds, target_names=classes))

# Compute the confusion matrix using true labels and predictions
cm = confusion_matrix(all_labels, all_preds)

# Visualize the confusion matrix using seaborn's heatmap
plt.figure(figsize=(10,8))
sns.heatmap(cm, annot=True, fmt="d", cmap=plt.cm.Blues, xticklabels=classes, yticklabels=classes)
plt.xlabel('Predicted Label')  # x-axis label
plt.ylabel('True Label')       # y-axis label
plt.title('Confusion Matrix')  # Title of the plot
plt.show()                     # Display the plot

print(f'Test Accuracy: {torch.mean(torch.tensor(batch_acc))*100:.2f}%')

The true label axis shows the actual category of the samples, and the predicted label comes from our model. The above matrix shows which samples were miscategorized. The tendency is that items that look similar, like pullovers and coats, or shirts and T-shirts, might be mispredicted.

## Regularization with dropout

Let's use dropout layers to prevent the model from becoming too reliant on any specific neuron.

In [None]:
import torch.nn as nn

In [None]:
class NetDropout(nn.Module):
    def __init__(self):
        super(NetDropout, self).__init__()
        
        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=32, kernel_size=3, padding=1)  # Output shape: [batch_size, 32, 28, 28]
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=64, kernel_size=3, padding=1) # Output shape: [batch_size, 64, 14, 14]
        
        # Max pooling layer
        self.pool = nn.MaxPool2d(kernel_size=2, stride=2, padding=0) # Reduces spatial dimensions by half
        
        # Dropout layer
        self.dropout = nn.Dropout(0.25)  # Helps prevent overfitting
        
        # Fully connected layers
        self.fc1 = nn.Linear(64 * 7 * 7, 512)  # Flattened input to 512 output features
        self.fc2 = nn.Linear(512, 10)          # 512 input features to 10 output classes

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x))) # Shape: [batch_size, 32, 14, 14]
        x = self.pool(F.relu(self.conv2(x))) # Shape: [batch_size, 64, 7, 7]
        
        x = x.view(-1, 64 * 7 * 7)  # Flatten the tensor
        x = self.dropout(x)         # Apply dropout
        
        x = F.relu(self.fc1(x))     # First fully connected layer with ReLU activation
        x = self.fc2(x)             # Second fully connected layer
        
        # Here, we're not applying log_softmax. If you use nn.CrossEntropyLoss as the loss function later,
        # it will implicitly apply softmax for you.
        # If you plan on using nn.NLLLoss, uncomment the line below:
        # x = F.log_softmax(x, dim=1)
        
        return x

# Instantiate the model with dropout
model_dropout = NetDropout().to(device)
model_dropout

In [None]:
optimizer = torch.optim.Adam(model_dropout.parameters(), lr=LEARNING_RATE)
criterion = nn.CrossEntropyLoss()

In [None]:
train_validate(model_dropout, criterion, optimizer, train_dl, val_dl, device, n_epochs=EPOCHS)

In [None]:
test_validate_confusion(model_dropout, val_dl, device, classes)

In [None]:
# Validate against test set
test_validate_confusion(model_dropout, test_dl, device, classes)

## Data Augmentation

We can add some data augmentation here as well. From the results in the Deep Neural Network notebook, we don't expect much benefit, so we'll skip it for this dataset.

**Next Notebook: [06-Resnet](06-Resnet.ipynb)**