<a href="https://colab.research.google.com/github/pablomiralles22/class-CV-computer-vision/blob/main/Convolution_MNIST.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import datasets, transforms

## 📦 Data Loading
In this section, we load the MNIST dataset. MNIST is a classic dataset of handwritten digits (0–9). Each image is 28x28 pixels in grayscale. We'll use this data to train a simple image classification model.

In [None]:
# --- Data ---
transform = transforms.ToTensor()

train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)

val_dataset = datasets.MNIST('.', train=False, transform=transform)
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)

Let's explain the code step by step:

```python
transform = transforms.ToTensor()
```

* Converts PIL images or NumPy arrays to PyTorch tensors.
* Scales pixel values from `[0, 255]` to `[0.0, 1.0]`.

```python
train_dataset = datasets.MNIST('.', train=True, download=True, transform=transform)
```

* Loads the **training set** of the MNIST dataset.
* Stores data in the current directory (`'.'`).
* Downloads the dataset if not already present.
* Applies `ToTensor()` transformation.

```python
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=64, shuffle=True)
```

* Wraps the training dataset in a data loader.
* Loads data in **mini-batches of 64**.
* Shuffles data every epoch for better training.

```python
val_dataset = datasets.MNIST('.', train=False, transform=transform)
```

* Loads the **validation (test) set** of MNIST.
* No shuffling or downloading since it uses the test split.

```python
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=64, shuffle=False)
```

* Loads the validation data in batches of 64.
* **No shuffling**, preserving original order (typical for evaluation).


## 🧠 Define the Neural Network
Here, we define our neural network using PyTorch's nn.Module. It's a simple Convolutional Neural Network (CNN) with three convolutional layers followed by a final classification layer. CNNs are especially good for image recognition tasks.

In [None]:
# --- Model ---
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.net = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, stride=2),   # [B, 1, 28, 28] -> [B, 16, 13, 13]
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=2),  # [B, 16, 13, 13] -> [B, 32, 6, 6]
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2),  # [B, 32, 6, 6] -> [B, 64, 2, 2]
            nn.ReLU(),
        )
        self.classifier = nn.Linear(64 * 2 * 2, 10)  # Flatten: [B, 256] -> [B, 10]

    def forward(self, x):
        x = self.net(x)
        x = x.flatten(start_dim=1)
        return self.classifier(x)

```python
class SimpleCNN(nn.Module):
    def __init__(self):
        super().__init__()
```

* Defines a custom neural network class inheriting from `nn.Module`.

```python
        self.net = nn.Sequential(
            nn.Conv2d(1, 16, kernel_size=3, stride=2),   # [B, 1, 28, 28] -> [B, 16, 13, 13]
            nn.ReLU(),
            nn.Conv2d(16, 32, kernel_size=3, stride=2),  # [B, 16, 13, 13] -> [B, 32, 6, 6]
            nn.ReLU(),
            nn.Conv2d(32, 64, kernel_size=3, stride=2),  # [B, 32, 6, 6] -> [B, 64, 2, 2]
            nn.ReLU(),
        )
```

* A **convolutional feature extractor** with 3 layers:

  * Each `Conv2d` halves the spatial dimensions due to `stride=2`.
  * `ReLU` adds non-linearity after each conv layer.
  * Final output shape after convs: `[B, 64, 2, 2]`.

```python
        self.classifier = nn.Linear(64 * 2 * 2, 10)
```

* A **fully connected layer** that maps flattened conv output (size 256) to 10 output classes (digits 0–9).

```python
    def forward(self, x):
        x = self.net(x)                     # Pass through conv layers
        x = x.flatten(start_dim=1)          # Flatten to shape [B, 256]
        return self.classifier(x)           # Class scores (logits)
```

* **Forward pass logic**:

  1. Extract features.
  2. Flatten features.
  3. Classify with linear layer.


## 🏋️‍♀️ Training the model

In [None]:
# --- Training ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = SimpleCNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()


for epoch in range(5):  # Single epoch for minimal example
    print(f"Epoch {epoch}")

    # Train
    model.train()
    train_losses = []
    for x, y in train_loader:
        x, y = x.to(device), y.to(device)
        optimizer.zero_grad()
        out = model(x)
        loss = criterion(out, y)
        loss.backward()
        optimizer.step()
        train_losses.append(loss.item())
    print(f"Train loss: {sum(train_losses) / len(train_losses)}")

    # Validation
    val_losses = []
    with torch.no_grad():
        for x, y in val_loader:
            x, y = x.to(device), y.to(device)
            out = model(x)
            loss = criterion(out, y)
            val_losses.append(loss.item())
    print(f"Val loss: {sum(val_losses) / len(val_losses)}")


Epoch 0
Train loss: 0.36047024827108964
Val loss: 0.14033805436801736
Epoch 1
Train loss: 0.11889521469538814
Val loss: 0.08872888061984674
Epoch 2
Train loss: 0.08452595395273341
Val loss: 0.06738804081850834
Epoch 3
Train loss: 0.06646266297550439
Val loss: 0.06254863348225233
Epoch 4
Train loss: 0.05486489250709408
Val loss: 0.05864701733620805


```python
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
```

* Selects **GPU** if available, else falls back to **CPU**.

```python
model = SimpleCNN().to(device)
optimizer = optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.CrossEntropyLoss()
```

* Instantiates the model and moves it to the chosen device.
* Uses **Adam** optimizer with a learning rate of `1e-3`.
* Loss function: **Cross-Entropy**, standard for classification.


🔄 **Epoch Loop**

```python
for epoch in range(5):
    print(f"Epoch {epoch}")
```

* Trains for 5 epochs.


🏋️ **Training Phase**

```python
model.train()
train_losses = []
for x, y in train_loader:
    x, y = x.to(device), y.to(device)
    optimizer.zero_grad()
    out = model(x)
    loss = criterion(out, y)
    loss.backward()
    optimizer.step()
    train_losses.append(loss.item())
print(f"Train loss: {sum(train_losses) / len(train_losses)}")
```

* Sets model to **training mode** (`model.train()`).
* Loads batches of training data, moves to device.
* Clears gradients → forward pass → computes loss → backprop → optimizer step.
* Collects and averages training loss.


🧪 **Validation Phase**

```python
val_losses = []
with torch.no_grad():
    for x, y in val_loader:
        x, y = x.to(device), y.to(device)
        out = model(x)
        loss = criterion(out, y)
        val_losses.append(loss.item())
print(f"Val loss: {sum(val_losses) / len(val_losses)}")
```

* Disables gradient tracking with `torch.no_grad()` for efficiency. This disables the tracking of gradients necessary to apply the optimization step, but this is not necessary during validation, as we do not train the model.
* Evaluates model on validation data.
* Computes and reports average validation loss. This loss reflects how well the model does on unseen data, giving a more realistic picture of generalization. If the training loss goes down but the validation loss does not, we are overfitting to the training data.
