## Applied Deep Learning in Python using PyTorch

This document outlines the process of training a Convolutional Neural Network (CNN) for image classification using PyTorch. It follows a structured workflow, starting with data loading and preprocessing, followed by model definition, training, evaluation, and testing.

The dataset used in this project is MNIST, a collection of grayscale images of handwritten digits (0-9). All images are converted into PyTorch tensors for processing. The CNN architecture is designed for grayscale images and is trained using a loss function and an optimizer to improve its accuracy. The training loop includes validation steps to monitor the model’s performance.

Additionally, the notebook checks for GPU availability. If a GPU is detected, computations are offloaded for faster processing; otherwise, training runs on the CPU. After training, the model is evaluated on a separate test dataset to assess its classification accuracy.

This document is based on a template provided by Steven Edwards and serves as a foundation for image classification tasks using PyTorch. It can be adapted for other deep learning projects involving image recognition.

#### 1. Installing Packages and Connecting to GPU

1.1 Install PyTorch with CUDA 12.1:

In [None]:
# pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

If you get "Out of Memory" errors when training your model. Use the following code:

In [None]:
#import torch
#torch.cuda.empty_cache()

The below code checks for CUDA availability, prints the GPU name if available, and assigns the computing device (cuda:0 or cpu) to device for later use in mounting the network.

1.2 Connecting to GPU:

In [None]:
import torch

print(torch.__version__)  # Check PyTorch version
print(torch.version.cuda)  # Check CUDA version in PyTorch
print(torch.backends.cudnn.enabled)  # Check if cuDNN is enabled

print(torch.cuda.is_available())  # Should return True if CUDA is available
print(torch.cuda.get_device_name(0))  # Should return the GPU name

device = "cuda:0" if torch.cuda.is_available() else "cpu"

#### 2. Loading and Transforming the `MNIST` Dataset

For this project, I'm using the MNIST dataset, a collection of 28×28 grayscale images of handwritten digits (0-9). Instead of manually downloading and importing the dataset, I used PyTorch’s torchvision.datasets.MNIST, which allows for direct access and downloading.

To prepare the dataset, I use PyTorch's built-in ToTensor transform, which converts the images into tensors. Since MNIST images are already a standard size of 28x28, no additional resizing is needed.

I then create data loaders using PyTorch’s DataLoader class. This allows me to efficiently load the dataset in batches of 32 images at a time. Using batch processing is essential for training deep learning models, as it optimizes memory usage and speeds up computation, especially when training on a GPU. The training set contains 60,000 images, while the test set contains 10,000 images.

Finally, I check if a GPU is available and set the device accordingly. If a GPU is present, the model and data will be processed on it; otherwise, it will fall back to the CPU. A quick verification prints the shape of the first batch to ensure everything is loaded correctly.

2.1 Loading the `MNIST` Dataset:

In [None]:
import torch
import torchvision
from torchvision import transforms
from torch.utils.data import DataLoader

# Define transforms
transform = transforms.Compose([
    transforms.ToTensor(),  # Convert images to tensors
])

# Load MNIST dataset
trainset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
testset = torchvision.datasets.MNIST(root='./data', train=False, download=True, transform=transform)

# Check dataset details
print(f"Number of training samples: {len(trainset)}")  # Should be 60,000
print(f"Number of test samples: {len(testset)}")    # Should be 10,000
print(f"Classes: {trainset.classes}")               # Should be 0-9

# Define batch size
batch_size = 32  # Increased from 8

# Create data loaders
trainloader = DataLoader(trainset, batch_size=batch_size, shuffle=True, num_workers=2)
testloader = DataLoader(testset, batch_size=batch_size, shuffle=False, num_workers=2)

# Set device to GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Example: Iterate through one batch to verify
for images, labels in trainloader:
    images, labels = images.to(device), labels.to(device)
    print(f"Batch shape: {images.shape}")  # Should be [8, 1, 28, 28] (batch_size, channels, height, width)
    print(f"Labels: {labels}")
    break

Next, I verify what the shape of an individual tensor looks like:

In [None]:
train_iter = iter(trainset)

image, label = next(train_iter)

image.shape, label

Below, the labels and corresponding index values are confirmed:

In [None]:
print(trainset.class_to_idx)

2.2 Visualizing Tensor:

In [None]:
import matplotlib.pyplot as plt

# Define class names (0-9 for MNIST)
classes = [str(i) for i in range(10)]  

index = 400  # Select an image by index

image, label = trainset[index]  # Get the image and its label

plt.imshow(image.permute(1, 2, 0), cmap="gray")  # Use grayscale colormap
plt.title(f"Label: {classes[label]}")
plt.show()

#### 3. Preparing the Data for Training

To evaluate model performance, we will use the predefined training and test sets provided by the MNIST dataset. Below, I confirm the sizes of both datasets:

In [None]:
len(trainset), len(testset)

3.1 Create Subsets of Training Data:

In [None]:
# Split the original trainset into two parts: 80% for training and 20% for validation
trainset, valset = torch.utils.data.random_split(trainset, [48000, 12000])

# Verify the sizes of the new train and validation sets
print(f"New training set size: {len(trainset)}")  # Should be 48,000 (80% of 60,000)
print(f"Validation set size: {len(valset)}")     # Should be 12,000 (20% of 60,000)

# Check that the testset remains unchanged
print(f"Test set size: {len(testset)}")  # Should be 10,000

The following code prints the number of batches our data will be divided into during training:

In [None]:
num_batches_train = len(trainset) // batch_size
num_batches_test = len(testset) // batch_size
print(f'Number of batches in the training set: {num_batches_train}')
print(f'Number of batches in the test set: {num_batches_test}')

#### 4. Model Preparation

 PyTorch requires a data loader when using a GPU for model training. This section sets up the data loaders, which are efficient modules for moving our data from the CPU onto the GPU.

4.1 Setup Data Loaders:

In [25]:
trainloader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
                                          shuffle=True, num_workers=2)

valloader = torch.utils.data.DataLoader(valset, batch_size=batch_size,
                                          shuffle=False, num_workers=2)

testloader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
                                         shuffle=False, num_workers=2)

In this project, we are building a Convolutional Neural Network (CNN) for image classification using the MNIST dataset, which consists of 28x28 grayscale images of handwritten digits (0-9).

**The model is defined as a class in PyTorch, and consists of two main parts:**

`1. Layers Definition:` This part defines the layers of the model, including convolutional layers, max pooling layers, linear layers, and dropout layers. The output channels of one layer must match the input channels of the next.

`2. Forward Function:` This function defines how the data flows through the model, layer by layer. It also specifies activation functions (such as ReLU) that introduce non-linearity into the model, helping it learn more complex patterns in the data.

**Convolutional Layers**
In a CNN, the convolutional layers (e.g., self.conv1, self.conv2, self.conv3) apply filters to the input image to highlight important features, like edges and textures. Each convolution operation creates feature maps by sliding a filter across the image, performing a mathematical operation at each position. The Conv2d class in PyTorch is used to define these convolutional layers.

For MNIST, the input images are 28x28 pixels and have a single channel (grayscale). In the first convolutional layer (self.conv1), the input channels are set to 1 (for grayscale), and the output channels are set to 32. The output channels represent different feature maps generated by the filters.

After each convolution, the output is passed through a ReLU activation function to introduce non-linearity. ReLU helps the model learn more complex patterns by turning negative values to zero while keeping positive values unchanged.

**Max Pooling Layers**
Max pooling layers (MaxPool2d) are used after the convolutional layers to downsample the feature maps. Pooling reduces the dimensionality of the data, making the model more efficient and less complex while retaining the most important features. In this model, we use a 2x2 pooling window to reduce the spatial size of the feature maps.

Why This Architecture Works for MNIST
The convolutional layers help the model learn spatial hierarchies in the image, such as edges, shapes, and digits, while the pooling layers help reduce computational complexity. This combination of convolution, activation, and pooling layers is ideal for the MNIST dataset, as it allows the model to efficiently extract important features while maintaining computational efficiency. The fully connected layers then help classify the features into one of the 10 digit categories.

4.2 Define CNN Architecture:

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F

class NeuralNet(nn.Module):
    def __init__(self):
        super().__init__()

        # Convolutional layers
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=5, padding=2)
        self.pool1 = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=128, kernel_size=5, padding=2)
        self.pool2 = nn.MaxPool2d(2, 2)
        self.conv3 = nn.Conv2d(in_channels=128, out_channels=256, kernel_size=4, padding=1)
        self.pool3 = nn.MaxPool2d(2, 2)

        # Calculate the size of the flattened features after all convolutional and pooling layers
        self._flattened_features = self._get_conv_output_size()

        # Fully connected layers
        self.fc1 = nn.Linear(in_features=self._flattened_features, out_features=1024)
        self.drop1 = nn.Dropout(p=0.3)
        self.fc2 = nn.Linear(in_features=1024, out_features=512)
        self.drop2 = nn.Dropout(p=0.3)
        self.out = nn.Linear(in_features=512, out_features=10) # out_features match class labels

    def _get_conv_output_size(self):
        # Create a dummy input to pass through the convolutional layers only
        dummy_input = torch.zeros(1, 1, 28, 28)  # Use the provided input dimensions
        dummy_input = self.conv1(dummy_input)
        dummy_input = self.pool1(dummy_input)
        dummy_input = self.conv2(dummy_input)
        dummy_input = self.pool2(dummy_input)
        dummy_input = self.conv3(dummy_input)
        dummy_input = self.pool3(dummy_input)
        return int(torch.flatten(dummy_input, 1).size(1))

    def forward(self, x):
        # Convolutional and pooling layers
        x = F.relu(self.conv1(x))
        x = self.pool1(x)
        x = F.relu(self.conv2(x))
        x = self.pool2(x)
        x = F.relu(self.conv3(x))
        x = self.pool3(x)

        # Flatten the output for the fully connected layers
        x = torch.flatten(x, 1)

        # Fully connected layers with ReLU activations and dropout
        x = F.relu(self.fc1(x))
        x = self.drop1(x)
        x = F.relu(self.fc2(x))
        x = self.drop2(x)

        # Output layer
        x = self.out(x)

        return x

Since the model will run on a GPU, I need to push it to the GPU, which is not necessary for CPU-only training. 

Additionally, I'll check the shape of the input and output features to ensure they match the model architecture, with the output having the defined batch size and the correct number of class labels (10 in my dataset).

4.3 Push Model to Device and Verify Parameters:

In [None]:
net = NeuralNet()
net.to(device)

The below code checks the shape of my initial inputs and my final outputs:

In [None]:
for i, data in enumerate(trainloader):
    inputs, labels = data[0].to(device), data[1].to(device)
    print(f'input shape: {inputs.shape}')
    print(f'after network shape: {net(inputs).shape}')
    break

To check how many parameters my CNN will estimate, I can calculate the total number. This number is often large and may cause out-of-memory errors. To address this, try reducing the batch size or simplifying the model by decreasing the number of channels in the layers. However, ensure consistency in the architecture. This can be tricky and requires careful adjustments.

In [None]:
num_params = 0
for x in net.parameters():
  num_params += len(torch.flatten(x))

print(f'Number of parameters in the model: {num_params:,}')

#### 5. Select Loss Function and Optimization Technique

I'm using CrossEntropyLoss as the loss function for multi-class classification. The loss function measures how well the model's predictions match the true labels, and the learning rate controls how aggressively the model updates its parameters during training. Experimenting with different learning rates and loss functions can help improve model performance.

5.1 Loss Function and Learning Rate:

In [30]:
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=0.0001)

#### 6. Define and Run the Training Loop

Training deep learning models can be challenging due to computational bottlenecks, especially with complex models and large datasets. To optimize training, we process data in batches and iterate over them in epochs. This prevents GPU memory overload and allows real-time reporting of loss and accuracy. Defining how epochs run and track progress is essential, with a separate validation epoch typically used after training on the training dataset.

6.1 Setup Training Epoch

In [None]:
def train_one_epoch():
    print("Starting training epoch...")
    net.train(True)
    
    running_loss = 0.0
    running_accuracy = 0.0
    batch_count = 0
    
    for batch_index, data in enumerate(trainloader):
        batch_count += 1
        
        inputs, labels = data[0].to(device), data[1].to(device)
        
        optimizer.zero_grad()
        
        outputs = net(inputs)
        correct = torch.sum(labels == torch.argmax(outputs, dim=1)).item()
        running_accuracy += correct / batch_size
        
        loss = criterion(outputs, labels)
        running_loss += loss.item()
        loss.backward()
        optimizer.step()
        
        if batch_index % 20 == 19:  # print every 20 batches
            avg_loss_across_batches = running_loss / 20
            avg_acc_across_batches = (running_accuracy / 20) * 100
            print('Batch {0}, Loss: {1:.3f}, Accuracy: {2:.1f}%'.format(
                batch_index + 1, avg_loss_across_batches, avg_acc_across_batches))
            running_loss = 0.0
            running_accuracy = 0.0
    
    print(f"Total batches processed: {batch_count}")
    print("Finished training epoch.")

# Call the function
train_one_epoch()

6.2 Setup Validation Epoch

In [None]:
def validate_one_epoch():
    try:
        print("Starting validation epoch...")
        net.train(False)
        running_loss = 0.0
        running_accuracy = 0.0
        total_samples = 0
        
        print(f"Number of batches in valloader: {len(valloader)}")
        
        for i, data in enumerate(valloader):
            print(f"Processing validation batch {i + 1}")
            inputs, labels = data[0].to(device), data[1].to(device)
            
            with torch.no_grad():
                outputs = net(inputs)
                correct = torch.sum(labels == torch.argmax(outputs, dim=1)).item()
                running_accuracy += correct
                total_samples += len(labels)
                loss = criterion(outputs, labels)
                running_loss += loss.item()
        
        avg_loss_across_batches = running_loss / len(valloader)
        avg_acc_across_batches = (running_accuracy / total_samples) * 100
        
        print('Val Loss: {0:.3f}, Val Accuracy: {1:.1f}%'.format(
            avg_loss_across_batches, avg_acc_across_batches))
        print('***************************************************')
        print()
    except Exception as e:
        print(f"An error occurred during validation: {e}")

# Call the function after training
validate_one_epoch()

#### 7. Train the Model

Now that we have defined our training and validation loops, we can train our model. We will run 10 epochs to optimize its performance and improve accuracy.

7.1 Run Epochs:

In [None]:
num_epochs = 10

for epoch_index in range(num_epochs):
    print(f'Epoch: {epoch_index + 1}\n')
    
    train_one_epoch()
    validate_one_epoch()
    
print('Finished Training')

#### 8. Evaluate the Model

After training, our model can be evaluated like any classification model. Key metrics include classification accuracy, true/false positive and negative rates. These metrics can be visualized using a confusion matrix with `matplotlib` and `seaborn`. We will use various metrics from sklearn to assess performance by running our test dataset through the model, generating predictions, and comparing them to the actual class labels to identify errors.

In [None]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score

# Get predicted labels from the model
predicted_labels = []
true_labels = []

# Iterate through test set and collect predictions
for images, labels in testloader:
    images = images.to(device)
    labels = labels.to(device)
    outputs = net(images)
    _, predicted = torch.max(outputs, 1)
    predicted_labels.extend(predicted.cpu().numpy())
    true_labels.extend(labels.cpu().numpy())

# Calculate accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print("Accuracy:", accuracy)

# Generate classification report
class_report = classification_report(true_labels, predicted_labels, target_names=classes)


# Generate confusion matrix
conf_matrix = confusion_matrix(true_labels, predicted_labels)

In [None]:
print("Classification Report:\n", class_report)

In [None]:
print("Confusion Matrix:\n", conf_matrix)

Below, the confusion matrix is visualized with more formatting:

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

# Extract class labels from the dataset
class_labels = testset.classes

# Plot confusion matrix
plt.figure(figsize=(8, 6))
sns.set(font_scale=1.2)  # Set font scale
sns.heatmap(conf_matrix, annot=True, cmap="Blues", fmt="d", xticklabels=class_labels, yticklabels=class_labels)
plt.xlabel("Predicted labels")
plt.ylabel("True labels")
plt.title("Confusion Matrix")
plt.show()


#### 9. Saving the Trained Model

The metrics indicate that our model performs exceptionally well. To avoid retraining it each time we want to make predictions on new data, we can save the model as a .pth file in the models folder.

In [None]:
import os

# Define the directory path to save the model
save_dir = r"C:/Users/ryanj/Desktop/COGS/2010/project_2"

# Define the file name for the saved model
model_name = "mnist_epoch10.pth"
save_path = os.path.join(save_dir, model_name)

# Save the model
torch.save(net.state_dict(), save_path)

print(f"Model saved at: {save_path}")

Now, I can load this model into my project separately, and test it on the same test data to make sure it is working:

In [None]:
import os
import torch

# Define the model architecture
net_test = NeuralNet()  # Ensure NeuralNet is defined elsewhere in your code

# Define the directory path and model name
save_dir = r"C:/Users/ryanj/Desktop/COGS/2010/project_2"
model_name = "mnist_epoch10.pth"
save_path = os.path.join(save_dir, model_name)

# Load the saved model state dictionary
net_test.load_state_dict(torch.load(save_path))

# Move the model to the GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
net_test.to(device)

# Set the model to evaluation mode
net_test.eval()

# Initialize counters for correct predictions and total samples
correct = 0
total = 0

# Disable gradient calculation for evaluation
with torch.no_grad():
    for images, labels in testloader:
        # Move images and labels to the GPU
        images, labels = images.to(device), labels.to(device)
        
        # Forward pass: Get model outputs
        outputs = net_test(images)
        
        # Get the predicted class with the highest score
        _, predicted = torch.max(outputs.data, 1)
        
        # Update total samples and correct predictions
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

# Calculate accuracy
accuracy = 100 * correct / total
print(f"Accuracy of the network on the test images: {accuracy:.2f}%")

#### Conclusion:

In this project, we successfully developed and trained a convolutional neural network (CNN) to classify handwritten digits from the MNIST dataset. The model achieved an impressive accuracy of **99.33%** on the test set, demonstrating its effectiveness in recognizing digit patterns. We implemented robust training and validation loops, utilized appropriate loss functions and optimizers, and evaluated the model's performance using various metrics, including a confusion matrix and classification report. Finally, we saved the trained model for future use, ensuring that it can be easily loaded and utilized for making predictions without the need for retraining. Overall, the project highlights the power of deep learning in image classification tasks.