# Convolutional neural networks (CNN) basics

Welcome to the `07_cnn_basics` notebook. This notebook is part of a portfolio designed to showcase essential concepts and techniques in PyTorch, with a focus on convolutional neural networks (CNNs). CNNs are widely used in computer vision tasks such as image classification. 

This notebook explores topics such as setting up the environment, loading and preprocessing image datasets, building and training a simple CNN model, evaluating its performance, and visualizing learned features. It also includes methods for improving the model using regularization techniques.

## Table of contents

1. [Understanding CNNs](#understanding-cnns)
2. [Setting up the environment](#setting-up-the-environment)
3. [Loading and preprocessing the dataset](#loading-and-preprocessing-the-dataset)
4. [Building a simple CNN model](#building-a-simple-cnn-model)
5. [Training the CNN model](#training-the-cnn-model)
6. [Evaluating the CNN model](#evaluating-the-cnn-model)
7. [Visualizing intermediate outputs and filters](#visualizing-intermediate-outputs-and-filters)
8. [Improving the model with regularization](#improving-the-model-with-regularization)
9. [Conclusion](#conclusion)

## Understanding CNNs

Convolutional Neural Networks (CNNs) are a specialized type of artificial neural network designed for handling structured grid-like data, such as images. They are commonly used in computer vision tasks, including image classification, object detection, and segmentation. CNNs excel at automatically capturing spatial hierarchies in the data through the use of convolutional operations, which make them particularly well-suited for tasks involving visual data.

### **Key building blocks of CNNs**

CNNs consist of a series of core components that work together to process input data and extract meaningful features:

#### **Convolutional layer**  
The convolutional layer is the core of any CNN architecture. It applies a set of filters (also called kernels) to the input data, such as an image. These filters move across the input, performing element-wise multiplication between the filter and local patches of the input data. The result is summed to produce a feature map, which captures the presence of certain patterns, such as edges or textures, in the image.

Several key ideas make convolutional layers efficient:
- **Local receptive field**: Neurons in CNNs are connected only to a small, localized region of the input data, preserving spatial relationships and reducing the number of parameters.
- **Shared weights**: All neurons in a feature map share the same set of weights, allowing CNNs to detect patterns anywhere in the image.
- **Strides**: The step size at which the filter moves across the input. Increasing the stride size reduces the spatial dimensions of the output feature maps.
- **Padding**: Zero padding is often applied around the border of the input to control the output size.

Mathematically, the convolution operation is expressed as:

$$
h(x, y) = \sum_{i=0}^{k-1} \sum_{j=0}^{k-1} w_{ij} \cdot x(x+i, y+j)
$$

where $h(x, y)$ is the feature map value at position $(x, y)$, $w$ represents the filter weights, and $x$ is the input data.

#### **Activation functions**  
After the convolution operation, an activation function introduces non-linearity to the network. This is crucial for CNNs because the convolution operation itself is linear, and non-linearity allows the network to model complex patterns in the data.

The most commonly used activation function in CNNs is the **ReLU** (Rectified Linear Unit):

$$
f(x) = \max(0, x)
$$

ReLU is favored because it is simple, effective, and helps mitigate the vanishing gradient problem, allowing networks to train faster.

#### **Pooling layers**  
Pooling layers downsample the spatial dimensions of the feature maps, reducing computational complexity and helping to prevent overfitting. Pooling is typically applied after the convolutional layer.

The two most common types of pooling are:
- **Max pooling**: Retains the maximum value from each patch of the feature map.
- **Average pooling**: Computes the average value of each patch.

Max pooling is more commonly used as it captures the most prominent features in a region of the image.

#### **Fully connected layers**  
Once the convolutional and pooling layers have extracted high-level features from the input data, fully connected layers perform the final classification or regression task. Each neuron in a fully connected layer is connected to every neuron in the previous layer. The last fully connected layer typically uses a **softmax** activation function to output probabilities across the different classes in a classification task.

### **Architectural properties of CNNs**

CNNs possess several architectural properties that make them highly effective for image-based tasks:

- **Parameter sharing**: By sharing weights in convolutional layers, CNNs can detect features like edges regardless of their location in the image, significantly reducing the number of parameters compared to fully connected networks.
- **Sparse connectivity**: Neurons in convolutional layers are connected to only a small subset of the input, which decreases the computational cost and improves efficiency.
- **Translation invariance**: CNNs are naturally invariant to translations within the input data. This allows them to recognize objects even if their position in the image changes.

### **Training a CNN**

The process of training a CNN is similar to that of a traditional neural network, involving backpropagation and gradient descent. The difference lies in the handling of convolutional layers and filters.

- **Forward pass**: The input (e.g., an image) is passed through the network, where each convolutional and pooling layer transforms it into a set of high-level feature maps.
- **Loss calculation**: The final output is compared to the true labels using a loss function, such as cross-entropy for classification tasks.
- **Backpropagation**: Gradients of the loss with respect to the weights in the convolutional and fully connected layers are computed using the chain rule, and the weights are updated to minimize the loss.

### **Popular CNN architectures**

Over the years, several CNN architectures have gained popularity due to their success in various challenges and applications:
- **LeNet**: One of the earliest CNN architectures, designed for digit classification.
- **AlexNet**: Pioneered deeper CNNs and won the ImageNet competition in 2012, sparking widespread interest in deep learning.
- **VGG**: Introduced deeper networks with small filters (3x3) and uniform layer structures.
- **ResNet**: Introduced residual connections, allowing for much deeper networks without the degradation of performance.

### **Challenges in CNNs**

Despite their success, CNNs have some limitations:
- **Data requirements**: CNNs require large amounts of labeled data to avoid overfitting and to generalize well to new data.
- **Computational cost**: Training deep CNNs on high-resolution images can be computationally expensive.
- **Lack of rotation invariance**: CNNs are not inherently invariant to rotations or changes in scale, although this can be addressed through data augmentation techniques.

### **Applications of CNNs**

CNNs are widely used in many tasks beyond image classification, including:
- **Object detection**: CNN-based architectures like YOLO and Faster R-CNN detect objects in images and videos.
- **Image segmentation**: Architectures such as U-Net and fully convolutional networks (FCNs) are used for tasks like semantic segmentation.
- **Generative tasks**: Generative adversarial networks (GANs) use CNNs to generate realistic images.
- **Video analysis**: CNNs, often combined with recurrent neural networks (RNNs), are applied to video analysis for tasks like action recognition.

### **Maths**

#### **Structure of CNNs**

##### **Convolutional layer**
In CNNs, the convolutional layer performs the convolution operation, which is the mathematical basis of the feature extraction process. The operation applies filters to the input data, moving them across the image and computing a weighted sum of the input pixels in each region, followed by the addition of a bias term. This weighted sum is known as a **feature map**.

For a filter with weights $W$, the convolution operation at a position $(i, j)$ of the input $X$ can be expressed as:
$$
S(i,j) = (X * W)(i,j) = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} W(m,n) \cdot X(i+m,j+n) + b
$$
where:
- $X(i+m,j+n)$ is the pixel value in the local region of the input.
- $W(m,n)$ is the filter weight.
- $b$ is the bias term.
- $k$ is the size of the filter (e.g., 3x3).

##### **Pooling layer**
Pooling layers reduce the spatial dimensions of the feature maps by downsampling, which helps reduce computational complexity and avoid overfitting. The two main pooling operations are:

- **Max pooling**: Takes the maximum value within a local region.
- **Average pooling**: Takes the average value within a local region.

Mathematically, for a given pooling window size $p \times p$, max pooling at position $(i, j)$ is:
$$
P_{\text{max}}(i,j) = \max\{S(i+m,j+n) : 0 \leq m,n < p\}
$$
where $S$ is the input feature map, and $m, n$ are the indices within the pooling window.

##### **Fully connected layer**
Fully connected layers in CNNs behave similarly to those in traditional neural networks. Each neuron computes a weighted sum of its inputs, plus a bias term, followed by an activation function:
$$
z_j = \sum_{i=1}^{n} w_{ij} a_i + b_j
$$
where:
- $a_i$ is the input to the neuron.
- $w_{ij}$ is the weight connecting input $i$ to neuron $j$.
- $b_j$ is the bias term.

#### **Forward propagation in CNNs**

Forward propagation in CNNs involves passing the input data through the layers, starting from the convolutional layers to the fully connected layers, to generate the output prediction.

##### **Convolutional operation**
In a convolutional layer, the convolution operation extracts features from the input using filters. As described above, the convolution operation can be written as:
$$
S(i,j) = \sum_{m=0}^{k-1} \sum_{n=0}^{k-1} W(m,n) \cdot X(i+m,j+n) + b
$$
where the result $S(i,j)$ represents the activation of a neuron in the feature map.

##### **Activation function**
The output of each convolution is passed through a non-linear activation function, such as the **ReLU** (Rectified Linear Unit):
$$
f(z) = \max(0, z)
$$
ReLU introduces non-linearity into the network, allowing it to learn more complex patterns.

##### **Pooling operation**
After applying the activation function, pooling layers downsample the feature maps by taking the maximum or average values from local regions, as previously defined.

#### **Loss function in CNNs**

The loss function measures the difference between the predicted output and the actual target. For classification tasks, **cross-entropy loss** is often used:
$$
L = -\frac{1}{m} \sum_{i=1}^{m} \sum_{j=1}^{k} y_{ij} \log(\hat{y_{ij}})
$$
where:
- $y_{ij}$ is the true label of class $j$ for sample $i$.
- $\hat{y_{ij}}$ is the predicted probability of class $j$ for sample $i$.
- $m$ is the number of samples, and $k$ is the number of classes.

#### **Backpropagation in CNNs**

Backpropagation adjusts the weights of the filters and fully connected layers to minimize the loss function. The gradients of the loss with respect to the weights are calculated using the chain rule.

##### **Gradient descent for filters**
For a weight $W(m,n)$ in a convolutional layer, the weight update using gradient descent is:
$$
W(m,n) = W(m,n) - \eta \frac{\partial L}{\partial W(m,n)}
$$
where:
- $\eta$ is the learning rate.
- $\frac{\partial L}{\partial W(m,n)}$ is the gradient of the loss with respect to the filter weight.

##### **Gradient calculation**
The gradient of the loss with respect to a filter weight in a convolutional layer is computed as:
$$
\frac{\partial L}{\partial W(m,n)} = \sum_{i=0}^{h-1} \sum_{j=0}^{w-1} \frac{\partial L}{\partial S(i,j)} \cdot \frac{\partial S(i,j)}{\partial W(m,n)}
$$
where:
- $\frac{\partial L}{\partial S(i,j)}$ is the gradient of the loss with respect to the feature map at position $(i,j)$.
- $\frac{\partial S(i,j)}{\partial W(m,n)}$ is the derivative of the convolutional operation.

#### **Training a CNN**

Training a CNN involves the following steps:
1. **Initialize weights**: The weights of the filters and fully connected layers are initialized randomly.
2. **Forward propagation**: The input passes through the convolutional, pooling, and fully connected layers to generate predictions.
3. **Compute loss**: The loss function calculates the difference between predicted and true labels.
4. **Backpropagation**: The gradients of the loss with respect to the weights are computed.
5. **Update weights**: The weights are adjusted using gradient descent.
6. **Repeat**: This process is repeated for several epochs until the network converges to a minimum loss.

#### **Regularization in CNNs**

Regularization techniques are used to prevent overfitting in CNNs:

- **L2 regularization** adds a penalty to the loss function proportional to the sum of the squares of the weights:
  $$
  L_{\text{ridge}} = L + \lambda \sum_{j} W_j^2
  $$
  where $\lambda$ is the regularization parameter.

- **Dropout**: During training, dropout randomly sets a fraction of the neurons to zero in each layer, reducing the network’s reliance on any specific neuron and improving generalization.

## Setting up the environment

##### **Q1: How do you install the required libraries and dependencies to work with PyTorch and CNNs?**

##### **Q2: How do you set device configurations (CPU/GPU) in PyTorch?**

##### **Q3: How do you check the versions of PyTorch and other relevant libraries installed in your environment?**


##### **Q4: How do you set the random seed in PyTorch to ensure reproducibility?**

## Loading and preprocessing the dataset

##### **Q5: How do you load a dataset like CIFAR-10 or MNIST using torchvision in PyTorch?**


##### **Q6: How do you normalize an image dataset for input into a CNN in PyTorch?**


##### **Q7: How do you split a dataset into training, validation, and test sets using PyTorch?**


##### **Q8: How do you apply data augmentation techniques to increase the diversity of your training data in PyTorch?**


##### **Q9: How do you create a DataLoader in PyTorch for efficient data loading?**


##### **Q10: How do you apply common preprocessing techniques such as resizing and normalization to an image dataset in PyTorch?**

## Building a simple CNN model

##### **Q11: How do you define a simple CNN architecture in PyTorch using `nn.Module`?**


##### **Q12: How do you add convolutional layers to your model in PyTorch, and which parameters do you need to specify?**


##### **Q13: How do you implement pooling layers in a CNN using PyTorch?**


##### **Q14: How do you add ReLU activation functions to your CNN model in PyTorch?**


##### **Q15: How do you flatten the output of convolutional layers to feed into fully connected layers in PyTorch?**


##### **Q16: How do you initialize weights for the layers of your CNN model in PyTorch?**

## Training the CNN model

##### **Q17: How do you define the cross-entropy loss function for a classification task in PyTorch?**


##### **Q18: How do you select and implement an optimizer, such as Adam, for training a CNN in PyTorch?**


##### **Q19: How do you implement the training loop in PyTorch to update model weights during CNN training?**


##### **Q20: How do you monitor and visualize the training progress, such as loss and accuracy, during training in PyTorch?**

## Evaluating the CNN model

##### **Q21: How do you evaluate the performance of your trained CNN on a validation or test set in PyTorch?**


##### **Q22: How do you calculate and visualize a confusion matrix for your CNN model's predictions in PyTorch?**


##### **Q23: How do you detect overfitting during CNN model evaluation by analyzing training and validation losses?**


##### **Q24: How do you visualize the classification results of a CNN model on test data in PyTorch?**


##### **Q25: How do you compute precision, recall, and F1-score for your CNN model's predictions in PyTorch?**

## Visualizing intermediate outputs and filters

##### **Q26: How do you extract and visualize the output of a specific layer in your CNN model during inference in PyTorch?**


##### **Q27: How do you visualize the learned filters of a convolutional layer in a trained CNN model using PyTorch?**


##### **Q28: How do you visualize the feature maps produced by the convolutional layers of a CNN in PyTorch?**


##### **Q29: How do you use matplotlib to plot and analyze the filters and feature maps of a CNN model in PyTorch?**


##### **Q30: How do you visualize the output of the last convolutional layer before flattening in your CNN model using PyTorch?**

## Improving the model with regularization

##### **Q31: How do you implement dropout in a CNN model in PyTorch to reduce overfitting?**


##### **Q32: How do you apply weight decay to your CNN model in PyTorch using an optimizer?**


##### **Q33: How do you implement data augmentation to improve generalization of your CNN model in PyTorch?**


##### **Q34: How do you adjust hyperparameters such as learning rate and batch size to optimize CNN performance in PyTorch?**


##### **Q35: How do you implement early stopping in PyTorch to prevent overfitting during CNN training?**


##### **Q36: How do you analyze the impact of regularization techniques like dropout and weight decay on the validation loss during CNN training in PyTorch?**

## Conclusion