<a href="https://colab.research.google.com/github/michaelfyy/intro-to-deep-learning/blob/main/deep_learning_tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Learning & Computer Vision Workshop

### MLP on MNIST using PyTorch

Welcome to this interactive tutorial! In this notebook, we will introduce some general concepts in deep learning and computer vision and then build a simple Multi-Layer Perceptron (MLP) to classify handwritten digits from the MNIST dataset.

## Overview

- **Deep Learning:** A branch of machine learning that uses neural networks with many layers to learn complex patterns.
- **Computer Vision:** A field of artificial intelligence focused on interpreting visual information from images or videos.
- **Multi-Layer Perceptron (MLP):** One of the foundational neural network models that consists of an input layer, one or more hidden layers, and an output layer.

In this tutorial, we'll:
1. Visualize the MNIST dataset
2. Build and train a simple MLP
3. Visualize predictions
4. Evaluate the overall accuracy of the model

In [None]:
# Import necessary libraries
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

# Check if CUDA (GPU) is available; otherwise, use CPU
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f'Using device: {device}')

In [None]:
# Define a transform to convert images to PyTorch tensors
transform = transforms.ToTensor()

# Download and load the MNIST training and test datasets
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Create data loaders for batching
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=64, shuffle=True)
test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=1000, shuffle=False)

## Visualizing the MNIST Dataset

Let's take a look at some examples from the MNIST dataset to see the handwritten digits.

In [None]:
# Function to display an image
def imshow(img):
    npimg = img.numpy()
    # If the image has one channel, squeeze it; otherwise, transpose to HWC format
    if npimg.shape[0] == 1:
        npimg = npimg.squeeze(0)
        plt.imshow(npimg, cmap='gray')
    else:
        npimg = np.transpose(npimg, (1, 2, 0))
        plt.imshow(npimg)
    plt.axis('off')
    plt.show()

# Get a batch of training images
dataiter = iter(train_loader)
images, labels = next(dataiter)

# Display the first 8 images
imshow(torchvision.utils.make_grid(images[:8]))
print('GroundTruth:', ' '.join(str(labels[j].item()) for j in range(8)))

## Building the MLP Model

We'll build a simple neural network with one hidden layer. The input layer has 784 neurons (since each MNIST image is 28x28 pixels), and the output layer has 10 neurons (one for each digit).

In [None]:
class MLP(nn.Module):
    def __init__(self, input_size=784, hidden_size=128, num_classes=10):
        super(MLP, self).__init__()
        # First fully connected layer
        self.fc1 = nn.Linear(input_size, hidden_size)
        # Activation function
        self.relu = nn.ReLU()
        # Output layer
        self.fc2 = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Flatten the image tensor into a vector
        x = x.view(x.size(0), -1)
        # Forward pass through the first layer and activation
        out = self.fc1(x)
        out = self.relu(out)
        # Pass through the output layer
        out = self.fc2(out)
        return out

# Instantiate the model and move it to the selected device
model = MLP().to(device)
print(model)

## Training the Model

We now set up our loss function and optimizer and train the MLP on the MNIST training data.

In [None]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Set the number of epochs
num_epochs = 20

for epoch in range(num_epochs):
    running_loss = 0.0
    for i, (images, labels) in enumerate(train_loader):
        # Move images and labels to the device (CPU or GPU)
        images, labels = images.to(device), labels.to(device)

        # Forward pass: Compute predicted outputs by passing inputs to the model
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward pass: Zero gradients, perform backpropagation, and update weights
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if (i+1) % 100 == 0:
            print(f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{len(train_loader)}], Loss: {running_loss/100:.4f}')
            running_loss = 0.0

print('Finished Training')

## Evaluating the Model

After training, we evaluate the model on the test dataset and visualize some predictions.

In [None]:
# Set the model to evaluation mode
model.eval()

# Initialize variables to track accuracy
correct = 0
total = 0

with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = model(images)
        # Get the predicted class with the highest score
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f'Accuracy of the model on the 10000 test images: {accuracy:.2f}%')

In [None]:
# Visualize predictions for a batch of test images
dataiter = iter(test_loader)
images, labels = next(dataiter)
images, labels = images.to(device), labels.to(device)
outputs = model(images)
_, predicted = torch.max(outputs, 1)

# Move images to CPU for visualization
images = images.cpu()

# Display the first 8 test images along with their predicted labels
imshow(torchvision.utils.make_grid(images[:8]))
print('Predicted:', ' '.join(str(predicted[j].item()) for j in range(8)))

## Conclusion

In this tutorial, we've explored the basics of deep learning and computer vision by building an MLP to classify the MNIST dataset. We visualized the dataset, built and trained the model, and evaluated its performance. This is just the beginning—there is a vast world of deep learning techniques waiting to be explored!