## LeNet-5 Overview

LeNet-5 is a pioneering convolutional neural network (CNN) designed by Yann LeCun and his colleagues in 1998 for handwritten and machine-printed character recognition. It is considered one of the earliest CNNs that demonstrated the power of convolutional layers in processing data that has a grid-like topology (e.g., images).

### Innovations of LeNet-5

LeNet-5 introduced several innovative ideas that have shaped the development of later CNNs:

1. **Layered Architecture**: It used a multi-layer design comprising convolutional layers, subsampling layers (now commonly called pooling layers), and fully connected layers. This structure is now standard in CNN architectures.
2. **Convolutional Layers**: These layers use learnable kernels or filters to capture spatial hierarchies in image data. This was a significant departure from prior neural networks that fully connected every input to every output.
3. **Subsampling Layers**: Also known as average pooling, these layers reduce the spatial dimensions of the input, helping to make the representation smaller and more manageable.
4. **Backpropagation with Gradient Descent**: LeNet-5 was trained using backpropagation and gradient descent to update the weights of the network, a method still fundamental in training deep learning models.
5. **Tanh Activation Functions**: Instead of using sigmoid or linear activation functions, LeNet-5 used hyperbolic tangent (tanh), which offers better performance by avoiding some issues with non-linearity.

### Key Properties of LeNet-5

- **Architecture**: The network consists of 7 layers (not counting the input layer): 3 convolutional layers, 2 subsampling layers, and 2 fully connected layers.
- **Use of Local Receptive Fields**: This allowed the network to extract elementary visual features such as edges, which are then combined in subsequent layers to detect higher-order features.
- **Shared Weights and Biases**: In convolutional layers, this reduces the number of free parameters, allowing the network to be more robust to variations in the input data.

### Advantages of LeNet-5

- **Efficiency**: The sharing of weights in convolutional layers drastically reduces the number of parameters compared to fully connected networks of similar size.
- **Robustness to Image Translations**: Due to its convolutional nature, it can recognize patterns with some degree of shift invariance, making it robust against translation of input images.
- **Foundation for Modern CNNs**: It laid the groundwork for modern deep learning architectures and techniques used in image recognition.

### Disadvantages of LeNet-5

- **Limited to Low-Resolution Images**: LeNet-5 was primarily designed for small input images (like 28x28 pixels for MNIST), and it might not perform well with larger or more complex images without modifications.
- **Susceptible to Overfitting**: In scenarios with limited training data, the network might overfit, although this is a common issue in many deep learning models.
- **Simplicity**: While it was state-of-the-art at the time, modern problems require deeper and more complex networks to capture detailed features in large-scale image data.

LeNet-5 remains a significant educational tool for understanding the basic concepts of convolutional neural networks and their applications in image processing.


## LeNet-5 Overview

LeNet-5, designed by Yann LeCun et al., is a foundational convolutional neural network that played a pivotal role in the advancement of machine learning and computer vision, particularly in the recognition of handwritten digits. Its architecture was revolutionary at the time of its introduction in the late 1990s.

### Innovations of LeNet-5

LeNet-5 introduced several key architectural elements that have become standard in modern CNNs:

1. **Convolutional Layers**: Employing local receptive fields and shared weights, these layers effectively capture spatial hierarchies in image data.
2. **Subsampling Layers**: Now commonly referred to as pooling layers, these reduce spatial dimensions and parameter counts, enhancing translational invariance.
3. **Activation Functions**: The use of sigmoid and hyperbolic tangent functions, which were standard before the popularization of ReLUs in later models.
4. **Fully Connected Layers**: Culminating in a classification layer, these synthesize the features extracted throughout the convolutional and pooling layers to make final predictions.

### Detailed Architecture and Parameter Calculation

The following table elaborates on each layer of LeNet-5, providing details on the dimensions, configurations, and calculations involved:

| Layer             | Input Dimension           | Output Dimension          | Kernel Size/Stride/Pad | Parameters Formula                                                               | Number of Parameters |
|-------------------|---------------------------|---------------------------|------------------------|----------------------------------------------------------------------------------|----------------------|
| **Input**         | $32 \times 32$ (imaginary padding around 28x28) | N/A                    | N/A                    | N/A                                                                              | 0                    |
| **C1: Conv1**     | $32 \times 32 \times 1$   | $28 \times 28 \times 6$   | $5 \times 5$, S=1, P=0 | $(5 \times 5 \times 1 + 1) \times 6$                                              | 156                  |
| **S2: Pooling1**  | $28 \times 28 \times 6$   | $14 \times 14 \times 6$   | $2 \times 2$, S=2       | $0$ (non-learnable)                                                              | 0                    |
| **C3: Conv2**     | $14 \times 14 \times 6$   | $10 \times 10 \times 16$  | $5 \times 5$, S=1, P=0 | $(5 \times 5 \times 6 + 1) \times 16$                                             | 2,416                |
| **S4: Pooling2**  | $10 \times 10 \times 16$  | $5 \times 5 \times 16$    | $2 \times 2$, S=2       | $0$ (non-learnable)                                                              | 0                    |
| **C5: Conv3**     | $5 \times 5 \times 16$    | $1 \times 1 \times 120$   | $5 \times 5$, S=1, P=0 | $(5 \times 5 \times 16 + 1) \times 120$                                           | 48,120               |
| **F6: Fully Connected** | $120$               | $84$                      | N/A                    | $(120 + 1) \times 84$                                                            | 10,164               |
| **Output**        | $84$                      | $10$                      | N/A                    | $(84 + 1) \times 10$                                                             | 850                  |

### Calculation Formulas

- **Parameter Formula for Conv and Fully Connected Layers**: $(K \times K \times C_{\text{in}} + 1) \times C_{\text{out}}$, where $K$ is the kernel size, $C_{\text{in}}$ is the number of input channels, and $C_{\text{out}}$ is the number of output channels for convolutions; for fully connected layers, $C_{\text{in}}$ is the number of input units, and $C_{\text{out}}$ is the number of output units.
- **Output Dimension for Conv Layers**: $\left\lfloor\frac{W-K+2P}{S}+1\right\rfloor \times \left\lfloor\frac{H-K+2P}{S}+1\right\rfloor$, where $W$ and $H$ are the width and height of the input, $K$ is the kernel size, $P$ is the padding, and $S$ is the stride.
- **Output Dimension for Pooling Layers**: Same as above but typically with $P=0$.

### Advantages of LeNet-5

- **Efficiency and Simplicity**: LeNet-5's relatively simple architecture was highly efficient on the computation resources available at the time, making it ideal for practical applications like digit recognition.
- **Pioneer of CNNs**: As one of the first successful applications of convolutional networks, it set a framework that has inspired countless modern architectures.

### Disadvantages of LeNet-5

- **Limited by Modern Standards**: In today's context, LeNet-5's simplicity is overshadowed by more advanced networks that can handle more complex tasks and larger images.
- **Lack of Modern Techniques**: It does not utilize modern techniques such as ReLU, dropout, or batch normalization, which are proven to enhance performance in contemporary neural networks.

LeNet-5 is not only a cornerstone in the evolution of deep learning but also a great educational tool for understanding the basic concepts of convolutional neural networks.



# In-depth LeNet-5 Tutorial with Detailed Layer-by-Layer Computations

Explore the mathematical intricacies of LeNet-5, one of the pioneering convolutional neural networks developed by Yann LeCun. This tutorial provides detailed descriptions of both forward and backward computations for each layer.

## LeNet-5 Architecture Overview

LeNet-5 consists of several distinct layers:
1. **Input Layer**: 32x32 input images, zero-padded from 28x28.
2. **C1 - Convolutional Layer**: 6 feature maps, 28x28 each.
3. **S2 - Subsampling Layer**: Average pooling, reduces to 14x14.
4. **C3 - Convolutional Layer**: 16 feature maps with varied connections.
5. **S4 - Subsampling Layer**: Reduces to 5x5.
6. **C5 - Convolutional Layer**: Fully connected, outputs 120 units.
7. **F6 - Fully Connected Layer**: 84 units.
8. **Output Layer**: Softmax for classification into 10 categories.

## Detailed Layer-by-Layer Operations

### C1 - Convolutional Layer
- **Forward Pass**:
  - **Formula**: $O_{ij}^l = \sigma(b^l + \sum_m \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} W_{pq}^{lm} \cdot I_{i+p,j+q}^m)$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_{i+p,j+q}^m} = \sum_l \frac{\partial L}{\partial O_{ij}^l} \cdot W_{pq}^{lm} \cdot \sigma'(net_{ij}^l)$
  - **Gradient w.r.t. weights**: $\frac{\partial L}{\partial W_{pq}^{lm}} = \sum_{ij} \frac{\partial L}{\partial O_{ij}^l} \cdot I_{i+p,j+q}^m \cdot \sigma'(net_{ij}^l)$

### S2 - Subsampling Layer
- **Forward Pass**:
  - **Formula**: $O_{ij} = \sigma(b + \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} W \cdot I_{2i+p,2j+q})$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_{2i+p,2j+q}} = \frac{\partial L}{\partial O_{ij}} \cdot W \cdot \sigma'(net_{ij})$

### C3 - Convolutional Layer
- **Forward Pass**:
  - **Formula**: $O_{ij}^k = \sigma(b^k + \sum_n \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} W_{pq}^{kn} \cdot I_{i+p,j+q}^n)$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_{i+p,j+q}^n} = \sum_k \frac{\partial L}{\partial O_{ij}^k} \cdot W_{pq}^{kn} \cdot \sigma'(net_{ij}^k)$
  - **Gradient w.r.t. weights**: $\frac{\partial L}{\partial W_{pq}^{kn}} = \sum_{ij} \frac{\partial L}{\partial O_{ij}^k} \cdot I_{i+p,j+q}^n \cdot \sigma'(net_{ij}^k)$

### S4 - Subsampling Layer
- **Forward Pass**:
  - **Formula**: $O_{ij} = \sigma(b + \sum_{p=0}^{P-1} \sum_{q=0}^{Q-1} W \cdot I_{2i+p,2j+q})$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_{2i+p,2j+q}} = \frac{\partial L}{\partial O_{ij}} \cdot W \cdot \sigma'(net_{ij})$

### C5 - Convolutional Layer
- **Forward Pass**:
  - **Formula**: $O_j = \sigma(b_j + \sum_i W_{ij} \cdot I_i)$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_i} = \sum_j \frac{\partial L}{\partial O_j} \cdot W_{ij} \cdot \sigma'(net_j)$
  - **Gradient w.r.t. weights**: $\frac{\partial L}{\partial W_{ij}} = \sum_j \frac{\partial L}{\partial O_j} \cdot I_i \cdot \sigma'(net_j)$

### F6 - Fully Connected Layer
- **Forward Pass**:
  - **Formula**: $O_j = \sigma(b_j + \sum_i W_{ij} \cdot I_i)$
- **Backward Pass**:
  - **Gradient w.r.t. input**: $\frac{\partial L}{\partial I_i} = \sum_j \frac{\partial L}{\partial O_j} \cdot W_{ij} \cdot \sigma'(net_j)$
  - **Gradient w.r.t. weights**: $\frac{\partial L}{\partial W_{ij}} = \sum_j \frac{\partial L}{\partial O_j} \cdot I_i \cdot \sigma'(net_j)$

### Output Layer - Softmax
- **Forward Pass**:
  - **Formula**: $O_k = \frac{e^{z_k}}{\sum_{k'} e^{z_{k'}}}$
- **Backward Pass**:
  - **Gradient w.r.t. output of F6**: $\frac{\partial L}{\partial z_k} = O_k - y_k$ (where $y_k$ is the target probability for class k).

## Conclusion

Each layer in LeNet-5 has distinct roles in both feature extraction and backpropagation. Understanding these detailed operations and their mathematical foundations provides a deep insight into how convolutional neural networks function and learn from data.
