# Building Your First Image Classifier

Welcome to this hands-on lab where you'll build, train, and evaluate your first image classifier using PyTorch! You’ve already learned the core concepts behind the deep learning pipeline, and now it’s time to put that knowledge into practice. Your goal now is to create a neural network that can recognize handwritten digits from the **MNIST dataset**.

By the end of this notebook, you will have gone through the entire end-to-end process. Specifically, you will:
- **Prepare your data:** Load the MNIST dataset, inspect its format, and apply essential transformations.  
- **Build your model:** Define a custom neural network using PyTorch’s flexible `nn.Module` class.  
- **Train your model:** Implement the full training process with a loss function, optimizer, and training loop.  
- **Analyze your results:** Evaluate your model on unseen data and visualize its performance.

Let's get started!

## Imports

In [2]:
import numpy as np 

import torch
import torch.nn as nn 
import torch.optim as optim 
import torchvision
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt

import helper_utils

* The code below selects the best available hardware on your system to speed up model training.
    * **CUDA**: Runs on NVIDIA GPUs, which are widely used for deep learning and typically offer the fastest training performance. 
    * **MPS**: Runs on Apple Silicon GPUs, providing efficient acceleration on modern Mac systems.
    * **CPU**: Runs on the Central Processing Unit, the standard processor every computer has. PyTorch will automatically use it if no compatible GPU is detected.  

In [4]:
if torch.cuda.is_available():
    device = torch.device('cuda')
    print(f"Using device: CUDA")
elif torch.mps.is_available():
    device = torch.device('mps')
    print(f"Using device: MPS (Apple Silicon GPU)")
else:
    device = torch.device("cpu")
    print(f"Using device: CPU")

Using device: CPU


## MNIST Dataset: Preparing your Data

The [MNIST dataset](https://docs.pytorch.org/vision/main/generated/torchvision.datasets.MNIST.html) is a classic benchmark for image classification and is often considered the “hello world” of computer vision. It contains 60,000 training images and 10,000 test images. Each image is 28 by 28 pixels, in grayscale, showing a single handwritten digit from 0 to 9.

Before a model can learn from this data, you need to convert the images into numbers that a neural network can process. Each pixel becomes a numerical value representing its brightness, and together these numbers form a **tensor**, which is the format PyTorch models use for computation. Neural networks train best when these input values are small and centered around zero because that helps gradients flow more smoothly and makes learning more stable. To achieve this, you will **normalize** the pixel values so they fall within that range.

To see exactly what this means, you’ll start by loading the MNIST training data directly from `torchvision`, PyTorch’s library for computer vision tasks. You won’t apply any transformations yet. Instead, you’ll inspect a raw image to see what the data looks like before any changes, and later compare how that same image appears once it’s been prepared for the model.

- Define `data_path` that specifies the folder where your dataset will be stored.

In [5]:
# Set the path to store the dataset files
data_path = "./data"

* Load the MNIST dataset using `torchvision.datasets.MNIST`.
    * `root`: This tells PyTorch where to save the dataset files. In this case, at the location of the `data_path` you just defined.
    * `train`: Setting this to `True` ensures you get the training split of the dataset, which contains 60,000 images.
    * `download`: This handy parameter tells PyTorch to automatically download the files if they are not already present in your root folder.

In [6]:
train_dataset_without_transform = torchvision.datasets.MNIST(root=data_path, train=True, download=True)

100%|█████████████████████████████████████████████████████████████████████████████| 9.91M/9.91M [00:02<00:00, 3.54MB/s]
100%|██████████████████████████████████████████████████████████████████████████████| 28.9k/28.9k [00:00<00:00, 115kB/s]
100%|█████████████████████████████████████████████████████████████████████████████| 1.65M/1.65M [00:01<00:00, 1.02MB/s]
100%|█████████████████████████████████████████████████████████████████████████████████████| 4.54k/4.54k [00:00<?, ?B/s]


<br>

Now that you've loaded the dataset, you can inspect an individual item from it.

* You can retrieve any sample from your `Dataset` object just like a Python list by using its index. Here, you'll access the first item at index `0`.
* Notice that each item is a **tuple** containing two parts: the image data and its corresponding numerical label.
* After running the code, you should see the following:
    * The image is a **PIL Image** object, a common Python format for image data.
    * Its dimensions are **(28, 28)**, which matches the MNIST image size.
    * The label is an integer representing the digit shown in the image.
    
> **A Note on Labels**: 
>
>    Datasets in PyTorch return labels as **numerical indices**, not text. For the **MNIST dataset**, this is straightforward: index `0` represents the digit `0`, index `1` represents the digit `1`, and so on.
>
>    This relationship between label indices and their meanings is less direct in other datasets. For example, if you were working with images of cats and dogs, the labels would still be 0 and 1, not the words “cat” or “dog.” In those cases, you might create a list such as class_names = ['cat', 'dog']` to map the numeric labels back to readable names when displaying results or debugging.