# Lab 01: FashionMNIST & Baseline Model

In this lab, we'll build our first computer vision model using PyTorch. We'll work with the FashionMNIST dataset and create a simple baseline model to classify clothing items.

**What we'll cover:**
1. Loading and exploring the FashionMNIST dataset
2. Visualizing image data
3. Creating DataLoaders for batch processing
4. Building a baseline linear model
5. Training the model
6. Evaluating model performance

## 1. Import Libraries

Let's start by importing all the necessary libraries for this lab.

### Library Explanations:

| Library | Purpose |
|---------|---------|
| **`torch`** | The core PyTorch library. Provides tensor operations, automatic differentiation, and the foundation for building neural networks. |
| **`torch.nn`** | Neural network module containing building blocks like layers (`Linear`, `Conv2d`), loss functions (`CrossEntropyLoss`), and the base `Module` class for creating models. |
| **`torchvision`** | PyTorch's computer vision library. Contains popular datasets, model architectures, and image transformations. |
| **`torchvision.datasets`** | Pre-built datasets like FashionMNIST, CIFAR-10, ImageNet. Handles downloading and loading data automatically. |
| **`torchvision.transforms.ToTensor`** | Converts PIL images or NumPy arrays to PyTorch tensors. Also scales pixel values from [0, 255] to [0.0, 1.0]. |
| **`torch.utils.data.DataLoader`** | Wraps a dataset and provides batching, shuffling, and parallel data loading. Essential for efficient training. |
| **`matplotlib.pyplot`** | Plotting library for visualizing images, loss curves, and other data. We use it to display sample images from the dataset. |

In [None]:
import torch
from torch import nn
import torchvision
from torchvision import datasets
from torchvision.transforms import ToTensor
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

# Check PyTorch version
print(f"PyTorch version: {torch.__version__}")
print(f"Torchvision version: {torchvision.__version__}")

## 2. Load the FashionMNIST Dataset

Torchvision contains many real-world vision data like CIFAR, COCO, etc ([full list](https://docs.pytorch.org/vision/stable/datasets.html)). Fot this notebook, we will use the FashionMNIST dataset. Fashion-MNIST is a dataset of Zalando’s article images consisting of 60,000 training examples and 10,000 test examples. Each example comprises a 28×28 grayscale image and an associated label from one of 10 classes.

![FashionMNIST](https://github.com/poridhiEng/lab-asset/blob/main/tensorcode/Deep-learning-with-pytorch/Computer-Vision/Lab_01/images/image-1.png?raw=true)

The `ToTensor()` transform converts the images from PIL format to PyTorch tensors and scales pixel values from [0, 255] to [0, 1].

In [None]:
# Download training data
train_data = datasets.FashionMNIST(
    root="data",           # Where to store the data
    train=True,            # Training set
    download=True,         # Download if not already present
    transform=ToTensor(),  # Convert to tensor
    target_transform=None  # No transform on labels
)

# Download test data
test_data = datasets.FashionMNIST(
    root="data",
    train=False,           # Test set
    download=True,
    transform=ToTensor()
)

print(f"Training samples: {len(train_data)}")
print(f"Test samples: {len(test_data)}")

## 3. Explore the Dataset

Let's examine the structure of our data - what does a single sample look like?

In [None]:
# Get a single sample
image, label = train_data[0]

print(f"Image shape: {image.shape}")
print(f"Image dtype: {image.dtype}")
print(f"Label: {label}")
print(f"Label type: {type(label)}")
print(train_data[0])

### Understanding the Shape

The shape of the image tensor is `[1, 28, 28]` or more specifically:

```
[color_channels=1, height=28, width=28]
```

Having `color_channels=1` means the image is grayscale. If `color_channels=3`, the image would have pixel values for red, green and blue, this is also known as the `RGB color model`. The order of our current tensor is often referred to as `CHW` (Color Channels, Height, Width).

There's debate on whether images should be represented as `CHW` (color channels first) or `HWC` (color channels last).

> **Note:** You'll also see `NCHW` and `NHWC` formats where `N` stands for *number of images*. For example if you have a `batch_size=32`, your tensor shape may be `[32, 1, 28, 28]`. We'll cover batch sizes later.

PyTorch generally accepts `NCHW` (channels first) as the default for many operators.

### Getting Class Names

FashionMNIST has 10 clothing categories. Instead of hardcoding the class names, we can get them directly from the dataset using the `.classes` attribute. This is a good practice as it:
- Ensures consistency with the actual dataset labels
- Makes code more reusable across different datasets

In [None]:
# Get class names directly from the dataset
# The .classes attribute returns a list of all class names in the dataset
class_names = train_data.classes
class_names

## 4. Visualize Sample Images

Let's visualize some samples from our dataset to get a better understanding of what we're working with.

In [None]:
# Plot a single image
image, label = train_data[0]

plt.figure(figsize=(4, 4))
plt.imshow(image.squeeze(), cmap="gray")  # squeeze() removes the channel dimension for plotting
plt.title(f"Label: {class_names[label]}")
plt.axis(False)
plt.show()

### Visualize Multiple Random Samples

Let's plot a grid of random images from our training data to get a better feel for what the dataset contains. This helps us understand the variety and quality of images we're working with.

In [None]:
# Plot multiple images in a 4x4 grid
torch.manual_seed(42)  # Set seed for reproducibility

# Create a figure with 4 rows and 4 columns of subplots
fig, axes = plt.subplots(4, 4, figsize=(9, 9))

# Loop through each subplot position
for i, ax in enumerate(axes.flatten()):
    # Generate a random index to pick a sample from training data
    random_idx = torch.randint(0, len(train_data), size=[1]).item()
    image, label = train_data[random_idx]
    
    # Display the image (squeeze removes the color channel dimension for plotting)
    ax.imshow(image.squeeze(), cmap="gray")
    ax.set_title(class_names[label], fontsize=10)  # Show class name as title
    ax.axis(False)  # Hide axis ticks

plt.tight_layout()
plt.show()

## 5. Create DataLoaders

DataLoaders help us:
- **Batch** our data (process multiple samples at once)
- **Shuffle** training data (prevent the model from learning order)
- **Parallelize** data loading (faster training)

We'll use a batch size of 32, which is a common starting point.

In [None]:
# Set batch size
BATCH_SIZE = 32

# Create DataLoaders
train_dataloader = DataLoader(
    dataset=train_data,
    batch_size=BATCH_SIZE,
    shuffle=True  # Shuffle training data
)

test_dataloader = DataLoader(
    dataset=test_data,
    batch_size=BATCH_SIZE,
    shuffle=False  # Don't shuffle test data
)

print(f"Number of training batches: {len(train_dataloader)}")
print(f"Number of test batches: {len(test_dataloader)}")


In [None]:
# Let's examine a single batch
train_features_batch, train_labels_batch = next(iter(train_dataloader))

print(f"Batch of features shape: {train_features_batch.shape}")
print(f"Batch of labels shape: {train_labels_batch.shape}")

### Understanding Batch Shape

The batch shape `[32, 1, 28, 28]` means:
- **32**: Batch size (number of images)
- **1**: Color channels
- **28**: Height
- **28**: Width

This is the **NCHW** format (Batch, Channels, Height, Width).

## 6. Build the Baseline Model

Now let's create our first model! We'll build a simple baseline with:
- `nn.Flatten()`: Converts the 2D image to a 1D vector
- `nn.Linear()`: Fully connected layers

![Baseline Model Architecture](https://raw.githubusercontent.com/poridhiEng/lab-asset/8104ff41aaf569aa65977e43cdbadc13fc1b7a34/tensorcode/Deep-learning-with-pytorch/Computer-Vision/Lab_01/images/infra-8.svg)

The diagram above shows how our baseline model processes image data. A batch of 28x28 grayscale images is first **flattened** into a 1D vector of 784 values (Input layer). This vector then passes through two **linear layers**: the first transforms 784 inputs to 10 hidden units, and the second produces 10 output values - one for each clothing category. Note that this baseline model uses **no activation functions** between layers.

![FashionMNIST-Baseline-model](https://raw.githubusercontent.com/poridhiEng/lab-asset/8104ff41aaf569aa65977e43cdbadc13fc1b7a34/tensorcode/Deep-learning-with-pytorch/Computer-Vision/Lab_01/images/infra-1.svg)

### How the Model Works

Looking at the architecture diagram above, our model processes data in the following steps:

1. **Input Image `[1, 28, 28]`**: A single grayscale image with 28×28 pixels (784 total pixel values)

2. **Flatten Layer `[784]`**: Converts the 2D image into a 1D vector. This "flattens" `[1, 28, 28]` → `[784]` so it can be fed into linear layers

3. **Input Layer `784`**: The flattened pixel values serve as input to the neural network

4. **First Linear Layer `784 → 10`**: Takes 784 input features and transforms them to 10 hidden units. This layer learns patterns in the flattened pixel data

5. **Second Linear Layer `10 → 10`**: Takes 10 hidden units and outputs 10 values (one for each clothing class)

6. **Output Logits `[10]`**: Raw prediction scores for each of the 10 classes:
   - 0: T-shirt/top
   - 1: Trouser
   - 2: Pullover
   - 3: Dress
   - 4: Coat
   - 5: Sandal
   - 6: Shirt
   - 7: Sneaker
   - 8: Bag
   - 9: Ankle boot

> **Note:** This baseline model uses **NO activation functions** between layers. This is intentional - we want to establish a simple baseline first. In Lab 02, we'll explore what happens when we add non-linearity (ReLU).

In [None]:
class FashionMNISTModelV0(nn.Module):
    """Baseline model with only linear layers (no non-linearity)."""
    
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        
        self.layer_stack = nn.Sequential(
            nn.Flatten(),  # Flatten image: [1, 28, 28] -> [784]
            nn.Linear(in_features=input_shape, out_features=hidden_units),
            nn.Linear(in_features=hidden_units, out_features=output_shape)
        )
    
    def forward(self, x):
        return self.layer_stack(x)

In [None]:
# Create model instance
torch.manual_seed(42)

model_0 = FashionMNISTModelV0(
    input_shape=28*28,          # 784 pixels
    hidden_units=10,            # Hidden layer size
    output_shape=len(class_names)  # 10 classes
)

print(model_0)

### Test the Model with a Dummy Input

Before training, let's verify our model works by passing a dummy input through it.

In [None]:
# Create dummy input (same shape as a single image)
dummy_input = torch.randn(1, 1, 28, 28)  # [batch, channels, height, width]

# Forward pass
with torch.inference_mode():
    dummy_output = model_0(dummy_input)

print(f"Input shape: {dummy_input.shape}")
print(f"Output shape: {dummy_output.shape}")
print(f"Output (logits): {dummy_output}")

The output has shape `[1, 10]` - one row of 10 values (one for each class). These raw values are called **logits**. To get probabilities, we could apply softmax, but `nn.CrossEntropyLoss` handles this internally.

## 7. Setup Loss Function and Optimizer

For multi-class classification:
- **Loss function**: `nn.CrossEntropyLoss()` - compares predicted probabilities to true labels
- **Optimizer**: `torch.optim.SGD()` - updates weights using stochastic gradient descent

In [None]:
# Setup loss function
loss_fn = nn.CrossEntropyLoss()

# Setup optimizer
optimizer = torch.optim.SGD(
    params=model_0.parameters(),
    lr=0.1  # Learning rate
)

## 8. Create Accuracy Function

We need a way to measure how well our model is performing. Accuracy tells us the percentage of correct predictions.

In [None]:
def accuracy_fn(y_true, y_pred):
    """Calculate accuracy between true labels and predictions.
    
    Args:
        y_true: True labels
        y_pred: Predicted labels
    
    Returns:
        Accuracy as a percentage
    """
    correct = torch.eq(y_true, y_pred).sum().item()
    total = len(y_true)
    accuracy = (correct / total) * 100
    return accuracy

## 9. Create Timing Function

Let's create a helper function to track training time. This will be useful for comparing different models later.

In [None]:
from timeit import default_timer as timer

def print_train_time(start: float, end: float):
    """Print training time.
    
    Args:
        start: Start time of training
        end: End time of training
    
    Returns:
        Total training time in seconds
    """
    total_time = end - start
    print(f"Train time: {total_time:.3f} seconds")
    return total_time

## 10. Training the Model

Now comes the exciting part - training our model! 

![Training Loop](https://raw.githubusercontent.com/poridhiEng/lab-asset/8104ff41aaf569aa65977e43cdbadc13fc1b7a34/tensorcode/Deep-learning-with-pytorch/Computer-Vision/Lab_01/images/infra-2.svg)

### Understanding the Training Loop

As shown in the diagram above, training happens over multiple **epochs** (complete passes through the dataset). Within each epoch, we process the data in **batches** and repeat the following steps:

1. **Zero Gradients** (`optimizer.zero_grad()`): Clear gradients from the previous step to prevent accumulation
2. **Forward Pass** (`model(features)`): Pass input data through the model to get predictions
3. **Compute Loss** (`loss_fn(pred, target)`): Calculate how wrong our predictions are
4. **Backward Pass** (`loss.backward()`): Compute gradients of loss with respect to model parameters
5. **Optimizer Step** (`optimizer.step()`): Update model weights based on gradients

This cycle repeats for each batch, and then moves to the next epoch.

We'll train for 3 epochs (one epoch = one pass through entire training dataset).

In [None]:
# Set random seed for reproducibility
torch.manual_seed(42)

# Start timer
train_time_start = timer()

# Set number of epochs
epochs = 3

# Training loop
for epoch in range(epochs):
    print(f"\nEpoch: {epoch}\n---------")
    
    # --- Training Phase ---
    train_loss = 0
    
    # Loop through training batches
    for batch, (X, y) in enumerate(train_dataloader):
        # Put model in training mode
        model_0.train()
        
        # 1. Zero gradients (clear from previous step)
        optimizer.zero_grad()
        
        # 2. Forward pass
        y_pred = model_0(X)
        
        # 3. Calculate loss (per batch)
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()  # Accumulate loss
        
        # 4. Backward pass (compute gradients)
        loss.backward()
        
        # 5. Optimizer step (update weights)
        optimizer.step()
        
        # Print progress every 400 batches
        if batch % 400 == 0:
            print(f"Looked at {batch * len(X)}/{len(train_dataloader.dataset)} samples")
    
    # Calculate average training loss per epoch
    train_loss /= len(train_dataloader)
    
    # --- Testing Phase ---
    test_loss, test_acc = 0, 0
    
    # Put model in evaluation mode
    model_0.eval()
    
    with torch.inference_mode():
        for X, y in test_dataloader:
            # Forward pass
            test_pred = model_0(X)
            
            # Accumulate loss and accuracy
            test_loss += loss_fn(test_pred, y).item()
            test_acc += accuracy_fn(
                y_true=y,
                y_pred=test_pred.argmax(dim=1)  # Convert logits to predictions
            )
    
    # Calculate averages
    test_loss /= len(test_dataloader)
    test_acc /= len(test_dataloader)
    
    print(f"\nTrain loss: {train_loss:.4f}")
    print(f"Test loss: {test_loss:.4f} | Test accuracy: {test_acc:.2f}%")

# End timer
train_time_end = timer()

# Print total training time
total_train_time = print_train_time(
    start=train_time_start,
    end=train_time_end
)

## 11. Create Reusable Evaluation Function

We already evaluated our model during training, but let's create a **reusable function** that we can use to compare multiple models in future labs. This function returns a dictionary with the model's metrics, making it easy to compare results.

In [None]:
def eval_model(model: nn.Module,
               data_loader: DataLoader,
               loss_fn: nn.Module,
               accuracy_fn):
    """Evaluate model on a dataset.
    
    Args:
        model: PyTorch model to evaluate
        data_loader: DataLoader for the dataset
        loss_fn: Loss function
        accuracy_fn: Accuracy function
    
    Returns:
        Dictionary with model name, loss, and accuracy
    """
    loss, acc = 0, 0
    
    model.eval()
    with torch.inference_mode():
        for X, y in data_loader:
            # Forward pass
            y_pred = model(X)
            
            # Accumulate metrics
            loss += loss_fn(y_pred, y).item()
            acc += accuracy_fn(
                y_true=y,
                y_pred=y_pred.argmax(dim=1)
            )
    
    # Calculate averages
    loss /= len(data_loader)
    acc /= len(data_loader)
    
    return {
        "model_name": model.__class__.__name__,
        "model_loss": loss,
        "model_acc": acc
    }

In [None]:
# Evaluate our baseline model and store results for comparison
model_0_results = eval_model(
    model=model_0,
    data_loader=test_dataloader,
    loss_fn=loss_fn,
    accuracy_fn=accuracy_fn
)

# Display results (should match our training output above)
print(f"Baseline Model Results:")
print(f"Model: {model_0_results['model_name']}")
print(f"Loss: {model_0_results['model_loss']:.4f}")
print(f"Accuracy: {model_0_results['model_acc']:.2f}%")

## 12. Make Predictions on Sample Images

Let's visualize how our model performs on some random test images.

In [None]:
# Get random samples from test data
torch.manual_seed(42)

fig, axes = plt.subplots(3, 3, figsize=(9, 9))

model_0.eval()
with torch.inference_mode():
    for i, ax in enumerate(axes.flatten()):
        # Get random sample
        random_idx = torch.randint(0, len(test_data), size=[1]).item()
        image, true_label = test_data[random_idx]
        
        # Make prediction
        pred_logits = model_0(image.unsqueeze(0))  # Add batch dimension
        pred_label = pred_logits.argmax(dim=1).item()
        
        # Plot
        ax.imshow(image.squeeze(), cmap="gray")
        
        # Color title based on correct/incorrect
        title_color = "green" if pred_label == true_label else "red"
        ax.set_title(
            f"True: {class_names[true_label]}\nPred: {class_names[pred_label]}",
            color=title_color,
            fontsize=10
        )
        ax.axis(False)

plt.tight_layout()
plt.show()

## Summary

Congratulations! You've built your first computer vision model!

### What We Accomplished:
1. **Loaded FashionMNIST** - 60,000 training + 10,000 test images
2. **Explored the data** - 28x28 grayscale images, 10 classes
3. **Created DataLoaders** - Batch size of 32 for efficient training
4. **Built a baseline model** - Simple linear layers (no activation functions)
5. **Trained for 3 epochs** - Using CrossEntropyLoss and SGD optimizer
6. **Achieved ~80% accuracy** - Our baseline performance!

### What's Next?
In Lab 02, we'll:
- Add **non-linear activation functions** (ReLU)
- Explore why adding non-linearity might not always help!