<div style="display: flex; justify-content: space-between; align-items: center;">
    <div style="text-align: left; flex: 4">
        <strong>Author:</strong> Amirhossein Heydari ‚Äî 
        üìß <a href="mailto:amirhosseinheydari78@gmail.com">amirhosseinheydari78@gmail.com</a> ‚Äî 
        üêô <a href="https://github.com/mr-pylin/pytorch-workshop" target="_blank" rel="noopener">github.com/mr-pylin</a>
    </div>
    <div style="text-align: right; flex: 1;">
        <a href="https://pytorch.org/" target="_blank" rel="noopener noreferrer">
            <img src="../assets/images/pytorch/logo/pytorch-logo-dark.svg" 
                 alt="PyTorch Logo"
                 style="max-height: 48px; width: auto; background-color: #ffffff; border-radius: 8px;">
        </a>
    </div>
</div>
<hr>


**Table of contents**<a id='toc0_'></a>    
- [Dependencies](#toc1_)    
- [Convolutional Neural Network (CNN)](#toc2_)    
  - [Convolution vs. Correlation](#toc2_1_)    
  - [Basic Concepts](#toc2_2_)    
    - [Padding](#toc2_2_1_)    
    - [Stride](#toc2_2_2_)    
    - [Dilation](#toc2_2_3_)    
  - [Popular CNN Architectures](#toc2_3_)    
    - [Classic / Foundational](#toc2_3_1_)    
    - [Deeper & Structured CNNs](#toc2_3_2_)    
    - [Efficient / Modern CNNs](#toc2_3_3_)    
  - [Convolution in PyTorch](#toc2_4_)    
    - [1D Correlation](#toc2_4_1_)    
    - [2D Correlation](#toc2_4_2_)    
  - [CNN Implementation](#toc2_5_)    
    - [Using PyTorch](#toc2_5_1_)    

<!-- vscode-jupyter-toc-config
	numbering=false
	anchor=true
	flat=false
	minLevel=1
	maxLevel=6
	/vscode-jupyter-toc-config -->
<!-- THIS CELL WILL BE REPLACED ON TOC UPDATE. DO NOT WRITE YOUR TEXT IN THIS CELL -->

# <a id='toc1_'></a>[Dependencies](#toc0_)


In [None]:
import matplotlib.pyplot as plt
import torch
import torch.nn.functional as F
from torch import nn
from torchinfo import summary

In [None]:
# disable automatic figure display (plt.show() required)  
# this ensures consistency with .py scripts and gives full control over when plots appear
plt.ioff()

In [None]:
# set a seed for deterministic results
seed = 42
torch.manual_seed(seed)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

In [None]:
# check if cuda is available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# log
device

# <a id='toc2_'></a>[Convolutional Neural Network (CNN)](#toc0_)


## <a id='toc2_1_'></a>[Convolution vs. Correlation](#toc0_)

- Convolution and correlation are both operations used in signal processing and image analysis

**[Convolution](https://en.wikipedia.org/wiki/Convolution)**:

- Convolution measures how one function (the kernel) modifies the other function (the signal or image).
- In the context of image processing, it's used to apply a filter or kernel to an image.
- Mathematical Formulation (discrete signals):
   $$[f * g](i) = \sum_{j} f[j] \cdot g[i - j]$$

**[Correlation](https://en.wikipedia.org/wiki/Correlation)**:

- Correlation measures the similarity between two signals as one is shifted over the other.
- In image processing, it's used to detect patterns by sliding a filter over an image.
- Mathematical Formulation (discrete signals):
   $$[f \star g](i) = \sum_{j} f[j] \cdot g[i + j]$$

<div style="text-align: center; padding-top: 10px;">
    <img src="../assets/images/original/lti/corr-vs-conv.svg" alt="corr-vs-conv.svg" style="min-width: 512px; width: 70%; height: auto; border-radius: 16px;">
    <p><em>Figure 1: Correlation vs. Convolution</em></p>
</div>


## <a id='toc2_2_'></a>[Basic Concepts](#toc0_)


### <a id='toc2_2_1_'></a>[Padding](#toc0_)

- It refers to adding extra values (usually zeros) around the input tensor (signal or image) before applying the convolution operation
- Padding is used to control the size of the output and to allow the kernel to process the edges of the input
- `padding='same'`
  - To ensure that the output of the convolution operation has the same spatial dimensions (width and height for 2D convolutions, length for 1D convolutions) as the input
    $$p = \left\lceil \frac{k - 1}{2} \right\rceil$$
- `padding='valid'`
  - Means no padding is applied to the input
    $$\text{Output Size} = \left\lfloor \frac{\text{Input Size} - k}{s} + 1 \right\rfloor$$

<div style="text-align: center; padding-top: 10px;">
    <img src="../assets/images/original/lti/padding.svg" alt="padding.svg" style="min-width: 512px; width: 70%; height: auto; border-radius: 16px;">
    <p><em>Figure 2: Padding for Convolution</em></p>
</div>

üìù **More details**:

- https://medium.com/analytics-vidhya/convolution-padding-stride-and-pooling-in-cnn-13dc1f3ada26


### <a id='toc2_2_2_'></a>[Stride](#toc0_)

- It defines how much the kernel moves over the input tensor during the convolution
- A stride of `1` means the kernel moves one step at a time, fully overlapping with each adjacent position
- A stride of `2` means the kernel skips one element at a time, leading to downsampling (reducing the size of the output)

üìù **More details**:

- https://medium.com/analytics-vidhya/convolution-padding-stride-and-pooling-in-cnn-13dc1f3ada26


### <a id='toc2_2_3_'></a>[Dilation](#toc0_)

- It introduces gaps between the elements of the kernel, effectively "spreading out" the kernel
- This allows the kernel to cover a larger area of the input without increasing the number of parameters (kernel size)
- Dilation is useful for capturing long-range dependencies in the input.

<div style="text-align: center; padding-top: 10px;">
    <img src="../assets/images/original/lti/dilation.svg" alt="dilation.svg" style="min-width: 512px; width: 70%; height: auto; border-radius: 16px;">
    <p><em>Figure 3: Dilation for Convolution</em></p>
</div>

üìù **More details**:

- https://medium.com/@akp83540/dilation-rate-in-a-convolution-operation-a7143e437654


## <a id='toc2_3_'></a>[Popular CNN Architectures](#toc0_)


### <a id='toc2_3_1_'></a>[Classic / Foundational](#toc0_)


**LeNet-5 (1998)**

- Proposed by [Yann LeCun](https://en.wikipedia.org/wiki/Yann_LeCun) at [AT&T Bell Labs](https://en.wikipedia.org/wiki/Bell_Labs).
- **LeNet-5** is one of the earliest successful Convolutional Neural Networks (CNNs), designed for handwritten digit recognition.
- It introduced core CNN principles such as local receptive fields, weight sharing, and spatial pooling, forming the foundation of modern convolutional architectures.

üìù **Docs**:

- [Gradient-Based Learning Applied to Document Recognition [paper]](http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf)
- Check detailed info in [**lenet5.ipynb**](./models/cnn/lenet5.ipynb)


**AlexNet (2012)**

- Proposed by [Alex Krizhevsky](https://en.wikipedia.org/wiki/Alex_Krizhevsky), [Ilya Sutskever](https://en.wikipedia.org/wiki/Ilya_Sutskever), and [Geoffrey Hinton](https://en.wikipedia.org/wiki/Geoffrey_Hinton) at the [University of Toronto](https://en.wikipedia.org/wiki/University_of_Toronto).
- **AlexNet** is a deep Convolutional Neural Network that won the ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2012 by a large margin.
- It demonstrated the effectiveness of deep CNNs trained on GPUs and introduced key techniques such as ReLU activation, Dropout for regularization, and data augmentation.

üìù **Docs**:

- [ImageNet Classification with Deep Convolutional Neural Networks [paper]](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf)
- Check detailed info in [**alexnet.ipynb**](./models/cnn/alexnet.ipynb)


### <a id='toc2_3_2_'></a>[Deeper & Structured CNNs](#toc0_)


**VGGNet (2014)**

- Proposed by [Karen Simonyan](https://dblp.uni-trier.de/search/author?author=Karen%20Simonyan) and [Andrew Zisserman](https://dblp.uni-trier.de/pid/z/AndrewZisserman.html?q=Andrew%20Zisserman) at [University of Oxford](https://en.wikipedia.org/wiki/University_of_Oxford).
- **VGGNet** is a deep Convolutional Neural Network known for its simple and uniform architecture using only $3\times3$ convolutional layers stacked on top of each other.
- It demonstrated that depth is critical for good performance and became a standard benchmark in image classification tasks.

üìù **Docs**:

- [Very Deep Convolutional Networks for Large-Scale Image Recognition [paper]](https://arxiv.org/abs/1409.1556)
- Check detailed info in [**vggnet.ipynb**](./models/cnn/vggnet.ipynb)


**GoogLeNet / Inception v1 (2014)**

- Proposed by [Christian Szegedy](https://scholar.google.com/citations?user=bnQMuzgAAAAJ) et al. at [Google Research](https://research.google/) as part of the Inception project.
- **GoogLeNet** introduced the Inception module, combining multiple convolutional filters (1√ó1, 3√ó3, 5√ó5) and pooling in parallel to capture multi-scale features efficiently.
- It achieved state-of-the-art performance on ImageNet while keeping computational cost low, and popularized deeper, more efficient CNN architectures.

üìù **Docs**:

- [Going Deeper with Convolutions [paper]](https://arxiv.org/abs/1409.4842)
- Check detailed info in [**googlenet.ipynb**](./models/cnn/googlenet.ipynb)


**ResNet (2015)**

- Proposed by [Kaiming He](https://scholar.google.com/citations?user=DhtAFkwAAAAJ&hl=en&oi=sra), [Xiangyu Zhang](https://scholar.google.com/citations?user=yuB-cfoAAAAJ&hl=en&oi=sra), [Shaoqing Ren](https://scholar.google.com/citations?user=AUhj438AAAAJ&hl=en&oi=sra), and [Jian Sun](https://scholar.google.com/citations?user=ALVSZAYAAAAJ&hl=en&oi=sra) at [Microsoft Research](https://www.microsoft.com/en-us/research/).
- **ResNet** introduced residual (skip) connections, allowing training of extremely deep networks (up to hundreds of layers) without suffering from vanishing gradients.
- It became a foundational architecture for modern CNNs, influencing image recognition, detection, and segmentation tasks.

üìù **Docs**:

- [Deep Residual Learning for Image Recognition [paper]](https://arxiv.org/abs/1512.03385)
- Check detailed info in [**resnet.ipynb**](./models/cnn/resnet.ipynb)


### <a id='toc2_3_3_'></a>[Efficient / Modern CNNs](#toc0_)


**DenseNet (2017)**

- Proposed by [Gao Huang](https://scholar.google.com.hk/citations?user=-P9LwcgAAAAJ&hl), [Zhuang Liu](https://unknown.org), et al. at [Cornell University](https://www.cornell.edu/).
- **DenseNet** introduced dense connectivity, where each layer receives inputs from all preceding layers, improving gradient flow, feature reuse, and parameter efficiency.
- It achieved state-of-the-art performance on image classification benchmarks while being more compact than traditional deep networks.

üìù **Docs**:

- [Densely Connected Convolutional Networks [paper]](https://arxiv.org/abs/1608.06993)
- Check detailed info in [**densenet.ipynb**](./models/cnn/densenet.ipynb)


**MobileNet (2017)**

- Proposed by [Andrew G. Howard](https://scholar.google.com/citations?user=_9l8vD8AAAAJ&hl=en&oi=sra) et al. at [Google Research](https://research.google/).
- **MobileNet** introduced depthwise separable convolutions, drastically reducing the number of parameters and computational cost while maintaining competitive accuracy.
- Designed for mobile and embedded devices, it enables efficient real-time image classification and vision tasks.

üìù **Docs**:

- [MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications [paper]](https://arxiv.org/abs/1704.04861)
<!-- - Check detailed info in [**mobilenet.ipynb**](./models/cnn/mobilenet.ipynb) -->


**Xception (2017)**

- Proposed by [Fran√ßois Chollet](https://scholar.google.com/citations?user=VfYhf2wAAAAJ&hl=en&oi=sra) at [Google Research](https://research.google/).  
- **Xception** stands for ‚ÄúExtreme Inception‚Äù and replaces Inception modules with **depthwise separable convolutions**, improving efficiency and performance.  
- Designed for modern CNN applications, it achieves high accuracy while reducing model parameters and computational cost compared to conventional Inception architectures.  

üìù **Docs**:

- [Xception: Deep Learning with Depthwise Separable Convolutions [paper]](https://arxiv.org/abs/1610.02357)  
<!-- - Check detailed info in [**xception.ipynb**](./models/cnn/xception.ipynb) -->

**EfficientNet (2019)**

- Proposed by [Mingxing Tan](https://scholar.google.com/citations?user=6POeyBoAAAAJ&hl=en&oi=sra) and [Quoc V. Le](https://scholar.google.com/citations?user=vfT6-XIAAAAJ&hl=en&oi=sra) at [Google Research](https://research.google/).
- **EfficientNet** introduced a compound scaling method that uniformly scales network depth, width, and input resolution to achieve better accuracy with fewer parameters.
- It achieved state-of-the-art performance on ImageNet while being highly efficient, inspiring a family of models from EfficientNet-B0 to B7.

üìù **Docs**:

- [EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [paper]](https://arxiv.org/abs/1905.11946)
- Check detailed info in [**efficientnet.ipynb**](./models/cnn/efficientnet.ipynb)


**ConvNeXt (2022)**

- Proposed by [Zhuang Liu](https://scholar.google.com/citations?user=7OTD-LEAAAAJ&hl=en&oi=sra) et al. at [Facebook AI Research (FAIR)](https://ai.meta.com/).
- **ConvNeXt** modernized CNN design by adopting architectural ideas from Vision Transformers while retaining standard convolutional layers, achieving competitive performance with simpler and more efficient models.
- It demonstrates that carefully redesigned CNNs can match or outperform Transformers on image classification tasks with less computational cost.

üìù **Docs**:

- [A ConvNet for the 2020s [paper]](https://arxiv.org/abs/2201.03545)
<!-- - Check detailed info in [**convnext.ipynb**](./models/cnn/convnext.ipynb) -->


## <a id='toc2_4_'></a>[Convolution in PyTorch](#toc0_)

- Convolution operations (e.g. `nn.Conv1d`, `nn.Conv2d`) in PyTorch (and most deep learning frameworks) technically performs **correlation, not convolution!**
- Although the operation is named e.g. `Conv2d`, the correlation operation is preferred in practice for a few reasons
  1. **Simplicity**:
      - Correlation is easier to implement and understand since it doesn't require flipping the kernel
  1. **Equivalence in Learning**:
      - In the context of CNNs, the kernel weights are learned during training
      - Since the kernels are learned, whether you use convolution or cross-correlation doesn't matter
      - The network can learn equivalent filters regardless of whether the kernel is flipped or not

üìù **Docs**:

- `torch.nn.Conv1d`: [docs.pytorch.org/docs/stable/generated/torch.nn.Conv1d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv1d.html)
- `torch.nn.Conv2d`: [docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)
- `torch.nn.Conv3d`: [docs.pytorch.org/docs/stable/generated/torch.nn.Conv3d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.Conv3d.html)
- `torch.nn.functional.conv1d`: [docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv1d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv1d.html)
- `torch.nn.functional.conv2d`: [docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv2d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv2d.html)
- `torch.nn.functional.conv3d`: [docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv3d.html](https://docs.pytorch.org/docs/stable/generated/torch.nn.functional.conv3d.html)


### <a id='toc2_4_1_'></a>[1D Correlation](#toc0_)


In [None]:
# create a 1D signal and a kernel
signal_1d = torch.arange(1, 10).reshape(1, 1, -1)  # shape: [1, 1, 10] -> (batch_size, num_channels, signal_length)
kernel_1d = torch.tensor([2, 1, 2]).reshape(1, 1, -1)  # shape: [1, 1,  3]

In [None]:
# applies convolution with "same" padding, output size is the same as input size
conv_1d_1 = F.conv1d(
    signal_1d,
    kernel_1d,
    padding="same",
)

# applies convolution with "valid" padding, no padding is added, so the output size is reduced
conv_1d_2 = F.conv1d(
    signal_1d,
    kernel_1d,
    padding="valid",
)

# applies convolution with a padding of 2 and a stride of 2, which results in downsampling the output
conv_1d_3 = F.conv1d(
    signal_1d,
    kernel_1d,
    padding=2,
    stride=2,
)

# log
print(f"conv_1d_1 : {conv_1d_1}")
print(f"conv_1d_2 : {conv_1d_2}")
print(f"conv_1d_3 : {conv_1d_3}")

In [None]:
# plot
fig, axs = plt.subplots(nrows=1, ncols=4, figsize=(16, 4), layout="compressed")

axs[0].plot(signal_1d.squeeze(), marker="o", label="Original Signal")
axs[0].plot(kernel_1d.squeeze(), marker="o", color="purple", label="Kernel")
axs[0].set_title("Original Signal")
axs[0].legend()
axs[1].plot(conv_1d_1.squeeze(), marker="o", color="orange")
axs[1].set_title('Convolution with "Same" Padding')
axs[2].plot(conv_1d_2.squeeze(), marker="o", color="green")
axs[2].set_title('Convolution with "Valid" Padding')
axs[3].plot(conv_1d_3.squeeze(), marker="o", color="red")
axs[3].set_title("Convolution with Custom Padding and Stride")

plt.show()

### <a id='toc2_4_2_'></a>[2D Correlation](#toc0_)


In [None]:
# create a 2D signal (image) and a kernel
signal_2d = torch.arange(1, 26, dtype=torch.float32).reshape(1, 1, 5, 5)  # (batch_size, num_channels, signal_length)
kernel_2d = torch.tensor([[1, 0, -1], [1, 0, -1], [1, 0, -1]], dtype=torch.float32).reshape(1, 1, 3, 3)

In [None]:
# applies convolution with "same" padding, output size is the same as input size
conv_2d_1 = F.conv2d(signal_2d, kernel_2d, padding="same")

# applies convolution with "valid" padding, no padding is added, so the output size is reduced
conv_2d_2 = F.conv2d(signal_2d, kernel_2d, padding="valid")

# applies convolution with a padding of 1 and a stride of 2, which results in downsampling the output
conv_2d_3 = F.conv2d(signal_2d, kernel_2d, padding=1, stride=2)

In [None]:
# plot
fig, axs = plt.subplots(nrows=1, ncols=5, figsize=(20, 4), layout="compressed")

axs[0].imshow(signal_2d.squeeze(), cmap="gray")
axs[0].set(title="Original Signal (Image)", xticks=range(signal_2d.shape[3]), yticks=range(signal_2d.shape[2]))
axs[1].imshow(kernel_2d.squeeze(), cmap="gray")
axs[1].set(title="Kernel", xticks=range(kernel_2d.shape[3]), yticks=range(kernel_2d.shape[2]))
axs[2].imshow(conv_2d_1.squeeze(), cmap="gray")
axs[2].set(title='Convolution with "Same" Padding', xticks=range(conv_2d_1.shape[3]), yticks=range(conv_2d_1.shape[2]))
axs[3].imshow(conv_2d_2.squeeze(), cmap="gray")
axs[3].set(title='Convolution with "Valid" Padding', xticks=range(conv_2d_2.shape[3]), yticks=range(conv_2d_2.shape[2]))
axs[4].imshow(conv_2d_3.squeeze(), cmap="gray")
axs[4].set(
    title="Convolution with Custom Padding and Stride",
    xticks=range(conv_2d_3.shape[3]),
    yticks=range(conv_2d_3.shape[2]),
)

plt.show()

## <a id='toc2_5_'></a>[CNN Implementation](#toc0_)

- CNNs are a class of deep learning models specifically designed for processing structured grid-like data, such as images, videos, and certain types of sequential data.

**Key Components of CNNs**

1. **Feature Extraction**
   - **Convolutional Layers**
     - Core building block of a CNN.
     - Slide a filter (kernel) over the input data to produce a feature map.
   - **Pooling Layers**
     - Reduce spatial dimensions of feature maps.
     - Help make the model invariant to small translations and reduce computation.
     - Types:
       - **Max Pooling:** Takes the maximum value from each patch.
       - **Average Pooling:** Takes the average value from each patch.
1. **Classification**
   - Flatten the features extracted by convolution/pooling layers and pass through fully connected layers (MLP).
   - Performs final classification or regression task based on extracted features.
   - See [**Multi-Layer Perceptron (MLP)**](./05-multi-layer-perceptrons.ipynb).

<div style="text-align: center; padding-top: 10px;">
    <img src="../assets/images/original/cnn/cnn-general.svg" alt="cnn-general.svg" style="min-width: 512px; max-width: 100%; height: auto; border-radius: 16px;">
    <p><em>Figure 4: Convolutional Neural Networks Model</em></p>
</div>

**Calculating the number of parameters**:

<table style="margin: 0 auto; text-align:center;">
  <thead>
    <tr>
      <th colspan="4">Feature Extraction (Convolutional Layers)</th>
      <th colspan="4">Classification (Fully Connected Layers)</th>
    </tr>
    <tr>
      <th colspan="2">Convolution<sub>1</sub></th>
      <th colspan="2">Convolution<sub>L</sub></th>
      <th colspan="2">Hidden<sub>1</sub></th>
      <th colspan="2">OUtput (Logits)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
    </tr>
    <tr>
      <td>(c<sub>0</sub> √ó s<sub>1</sub> √ó s<sub>2</sub>) √ó c<sub>1</sub></td>
      <td>c<sub>1</sub></td>
      <td>(c<sub>L-1</sub> √ó s<sub>1</sub> √ó s<sub>2</sub>) √ó c<sub>L</sub></td>
      <td>c<sub>L</sub></td>
      <td>d<sub>0</sub> √ó h<sub>1</sub></td>
      <td>h<sub>1</sub></td>
      <td>h<sub>L-1</sub> √ó o</td>
      <td>o</td>
    </tr>
  </tbody>
  <tfoot>
    <tr>
      <td colspan="2">(c<sub>0</sub> √ó s<sub>1</sub> √ó s<sub>2</sub> + 1) √ó c<sub>1</sub></td>
      <td colspan="2">(c<sub>L-1</sub> √ó s<sub>1</sub> √ó s<sub>2</sub> + 1) √ó c<sub>L</sub></td>
      <td colspan="2">(d<sub>0</sub> + 1) √ó h<sub>1</sub></td>
      <td colspan="2">(h<sub>L-1</sub> + 1) √ó o</td>
    </tr>
  </tfoot>
</table>

**Training a CNN**:

- **Forward Pass:** Compute outputs using current weights and biases.
- **Loss Function:** E.g., Cross-Entropy Loss for classification, Mean Squared Error for regression.
- **Backward Pass (Backpropagation):** Compute gradients of the loss w.r.t. weights and biases.
- **Weight Update:** Update parameters using optimizers like Gradient Descent or Adam.
- **Regularization:** Use Dropout, Batch Normalization, etc., to prevent overfitting and stabilize training.

**Applications of CNNs**:

- Image Classification
- Object Detection
- Segmentation
- Face Recognition

‚úçÔ∏è **Notes**:

- **`torch.nn.Conv2d`**
    - **Loss functions:**
        - Multi-class classification: `torch.nn.CrossEntropyLoss` = `LogSoftmax` + `NLLLoss`
        - [CrossEntropyLoss docs](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)
        - [NLLLoss docs](https://pytorch.org/docs/stable/generated/torch.nn.NLLLoss.html)
    - **Activation for last layer:**
        - When using `CrossEntropyLoss`, no activation is needed; it internally computes LogSoftmax + NLLLoss.
        - [Softmax docs](https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html)
        - [LogSoftmax docs](https://pytorch.org/docs/stable/generated/torch.nn.LogSoftmax.html)
    - **Weights:**
        - Kaiming/He initialization
        - Uniform: $W \sim \mathcal{U}\left(-\sqrt{\frac{6}{n_\text{in}}}, \sqrt{\frac{6}{n_\text{in}}}\right)$
        - Normal: $W \sim \mathcal{N}\left(0, \frac{2}{n_\text{in}}\right)$
    - **Biases:** Initialized to zero
    - [Initialization docs](https://pytorch.org/docs/stable/nn.init.html)
    - Paper: [Delving deep into rectifiers: Surpassing human-level performance on ImageNet - He et al., 2015](https://www.cv-foundation.org/openaccess/content_iccv_2015/papers/He_Delving_Deep_into_ICCV_2015_paper.pdf)

üõù **Playgrounds**:

- [CNN Explainer](https://poloclub.github.io/cnn-explainer/)
- [Image Similarity Search](https://convnetplayground.fastforwardlabs.com/)
- [NN-SVG](https://alexlenail.me/NN-SVG/)


### <a id='toc2_5_1_'></a>[Using PyTorch](#toc0_)

<div style="text-align: center; padding-top: 10px;">
    <img src="../assets/images/original/cnn/cnn-example.svg" alt="cnn-example.svg" style="min-width: 512px; max-width: 100%; height: auto; border-radius: 16px;">
    <p><em>Figure 5: A Simple Example using Convolutional Neural Networks</em></p>
</div>

**Calculating the number of parameters**:

<table style="margin: 0 auto; text-align:center;">
  <thead>
    <tr>
      <th colspan="4">Feature Extraction (Convolutional Layers)</th>
      <th colspan="4">Classification (Fully Connected Layers)</th>
    </tr>
    <tr>
      <th colspan="2">Convolution<sub>1</sub></th>
      <th colspan="2">Convolution<sub>L</sub></th>
      <th colspan="2">Hidden<sub>1</sub></th>
      <th colspan="2">OUtput (Logits)</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
      <td>Weights</td>
      <td>Biases</td>
    </tr>
    <tr>
      <td>(3 √ó 3 √ó 3) √ó 8</td>
      <td>8</td>
      <td>(8 √ó 3 √ó 3) √ó 16</td>
      <td>16</td>
      <td>1024 √ó 32</td>
      <td>32</td>
      <td>32 √ó 10</td>
      <td>10</td>
    </tr>
  </tbody>
  <tfoot>
    <tr>
      <td colspan="2">(3 √ó 3 √ó 3 + 1) √ó 8</td>
      <td colspan="2">(8 √ó 3 √ó 3 + 1) √ó 16</td>
      <td colspan="2">(1024 + 1) √ó 32</td>
      <td colspan="2">(32 + 1) √ó 10</td>
    </tr>
    <tr style="border-top: 2px solid; font-weight: bold;">
      <td colspan="8">Total Parameters: 224 + 1168 + 32800 + 330 = <strong>34522</strong></td>
    </tr>
  </tfoot>
</table>

- Refer to [**cifar10-classification.ipynb**](./projects/cifar-classification/cifar-10/implementation-1/cifar10-classification.ipynb) for a comprehensive example on the CNN concept.

üìö **Tutorials**:

- Neural Networks: [pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial](https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial)
- Training a Classifier: [pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html)


In [None]:
class PytorchCNN(nn.Module):
    def __init__(
        self,
        in_channels: int,
        conv_channels: list[int],
        kernel_size: int,
        hidden_sizes: list[int],
        n_output: int,
        input_height: int,
        input_width: int,
    ):
        super().__init__()

        layers = []
        c_in = in_channels
        h, w = input_height, input_width

        # convolutional feature extractor
        for c_out in conv_channels:
            layers.append(nn.Conv2d(c_in, c_out, kernel_size=kernel_size, padding=kernel_size // 2))
            layers.append(nn.ReLU())
            layers.append(nn.MaxPool2d(kernel_size=2))

            # update spatial size after pooling
            h //= 2
            w //= 2
            c_in = c_out

        self.features = nn.Sequential(*layers)

        # compute flattened dimension
        d_flat = c_in * h * w

        # fully connected classifier
        fc_layers = []
        in_features = d_flat
        for h_size in hidden_sizes:
            fc_layers.append(nn.Linear(in_features, h_size))
            fc_layers.append(nn.ReLU())
            in_features = h_size

        fc_layers.append(nn.Linear(in_features, n_output))
        self.classifier = nn.Sequential(*fc_layers)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = torch.flatten(x, start_dim=1)
        x = self.classifier(x)
        return x

In [None]:
# parameters
batch_size = 100
n_channels = 3          # e.g., RGB images
height, width = 32, 32  # image size
n_output = 10           # number of classes

# random input data
X = torch.randn(batch_size, n_channels, height, width)

# random labels for classification
y_true = torch.randint(0, n_output, (batch_size,))

In [None]:
# instantiate model
pytorch_model = PytorchCNN(
    in_channels=3,
    conv_channels=[8, 16],
    kernel_size=3,
    hidden_sizes=[32],
    n_output=10,
    input_height=32,
    input_width=32,
)

pytorch_model

In [None]:
summary(pytorch_model, input_size=(batch_size, n_channels, height, width), device="cpu")

In [None]:
# forward pass
logits = pytorch_model(X)

# log
print(f"Logits:\n{logits}")

In [None]:
# define loss function
criterion = nn.CrossEntropyLoss()  # expects logits + integer labels

# compute loss
loss = criterion(logits, y_true)
print(f"loss: {loss.item()}")

In [None]:
# backward pass
loss.backward()  # computes gradients for all parameters

# log
print(f"gradients for first layer weights:\n{pytorch_model.features[0].weight.grad}")