In [None]:
%reset -f
%load_ext autoreload
%autoreload 2
%matplotlib inline

# Step 1: Dataset Preparation

First, we need to prepare our dataset. The dataset consists of images of different dog breeds that are located in separate folders according to their respective breeds. In our example, we assume that the directory of our dataset is dogs/ and each dog breed is in a separate folder within that directory. Here is how you can load the dataset using PyTorch's torchvision library:

In [None]:
from sys import platform
import os

if "win" not in platform:
    if not os.path.exists("dogs/"):
        !unzip dogs.zip
else:
    print("Your OS is Windows, unzip dogs.zip manually")

In [None]:
import torchvision.datasets as datasets

# Define the directories path where the data is stored
train_data_dir = 'dogs/Training'
test_data_dir = 'dogs/Test'

# Load the dataset using PyTorch's torchvision.datasets.ImageFolder
train_dataset = datasets.ImageFolder(train_data_dir)

In [None]:
import matplotlib.pyplot as plt
import pathlib
import pandas as pd

class_names = [i.stem for i in list(pathlib.Path('dogs/Training').iterdir())]
class_names = pd.Series(class_names).apply(lambda x: x.split('-')[1]).tolist()
class_idx = {v: k for k, v in train_dataset.class_to_idx.items()}

example = train_dataset[620]
plt.imshow(example[0]) # index 0 contains the image
plt.title(f"LABEL ID={example[1]}, NAME={class_idx[example[1]].split('-')[1]}"); # index 1 contains the label

# Step 2: Preprocessing

Before training the model, we need to preprocess our data to normalize the pixel values and resize the images to a consistent size. We can use PyTorch's transforms module to apply the necessary preprocessing steps:

In [None]:
import torchvision.transforms as transforms

# Define the transformation pipeline
transform = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.ToTensor(),
    # transforms.Normalize(mean=[0.485, 0.456, 0.406],
    #                      std=[0.229, 0.224, 0.225])
])

# Apply the transformation pipeline to our dataset
train_dataset = datasets.ImageFolder(train_data_dir, transform=transform)
test_dataset = datasets.ImageFolder(test_data_dir, transform=transform)

Here, we resize the images to a 224x224 resolution, convert them to PyTorch tensors, and normalize the pixel values using the mean and standard deviation of the ImageNet dataset.

# Step 3: Splitting the Dataset

To train our model, we need to split our dataset into training, validation, and test sets. We can use PyTorch's random_split function to split the dataset into these sets:

In [None]:
from torch.utils.data import random_split

# Define the sizes of the training, validation, and test sets
train_size = int(0.7 * len(train_dataset))
val_size = int(0.3 * len(train_dataset))
test_size = len(test_dataset)

# Split the dataset randomly into training, validation, and test sets
train_dataset, val_dataset = random_split(train_dataset, [train_size, val_size])

In this example, we are using a 70-30 split for our training and validation. We keep the test set separate since we have a separate folder for it (`'dogs/Test'`).

# Step 4: Creating Data Loaders

Next, we need to create data loaders to feed our data to the model during training. Data loaders are a PyTorch abstraction that helps us load data efficiently by prefetching batches in parallel with the model's computation.

In [None]:
from torch.utils.data import DataLoader

# Define the batch size for the data loaders
batch_size = 16

# Create data loaders for the training, validation, and test sets
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=batch_size)
test_loader = DataLoader(test_dataset, batch_size=batch_size)

Here, we define a batch size of 64 and use the DataLoader class to create data loaders for our training, validation, and test sets.

# Step 5: Defining the Model

Now, we can define our model architecture. In this example, we will use a pre-trained ResNet-18 model as our feature extractor

In [None]:
import torch
import torch.nn as nn
import torchvision.models as models

# Load a pre-trained ResNet-18 model
model = models.resnet18(pretrained=True)

# Freeze all layers in the model
for param in model.parameters():
    param.requires_grad = False

# Replace the last fully connected layer with a new one that outputs the number of classes
num_classes = len(train_dataset.dataset.classes)
model.fc = nn.Linear(model.fc.in_features, num_classes)

# Move the model to the GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)

In this example, we use a pre-trained ResNet-18 model as our feature extractor and replace the last fully connected layer with a new one that outputs the number of classes in our dataset. We also freeze all the layers in the model except for the new fully connected layer. Finally, we move the model to the GPU if it is available.

# Step 6: Defining the Loss Function and Optimizer

Before we can train our model, we need to define a loss function and an optimizer. In this example, we will use cross-entropy loss and stochastic gradient descent (SGD) as our loss function and optimizer, respectively.

In [None]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.fc.parameters(), lr=0.01, momentum=0.9)

Here, we use cross-entropy loss as our loss function and SGD with a learning rate of 0.01 and momentum of 0.9 as our optimizer. We only optimize the parameters in the last fully connected layer of our model.

# Step 7: Training the Model

Now, we can train our model using a loop that iterates over the training data for multiple epochs:

In [None]:
from tqdm.notebook import tqdm

# Define the number of epochs to train for (increase it)
num_epochs = 3

# Train the model for multiple epochs
for epoch in tqdm(range(num_epochs), desc="Total progress (in epochs)"):
    running_loss = 0.0
    correct = 0
    total = 0
    for i, (inputs, labels) in tqdm(enumerate(train_loader), desc="Training (in batches)", total=len(train_dataset)//batch_size):
        # Move the inputs and labels to the GPU if available
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Zero the parameter gradients
        optimizer.zero_grad()

        # Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # Backward pass and optimization
        loss.backward()
        optimizer.step()

        # Print statistics
        running_loss += loss.item() * inputs.size(0)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    # Compute the average training loss and accuracy
    train_loss = running_loss / len(train_dataset)
    train_acc = 100 * correct / total

    # Compute the validation loss and accuracy
    val_loss = 0.0
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in tqdm(val_loader, desc="Validating (in batches)", total=len(val_dataset)//batch_size):
            # Move the inputs and labels to the GPU if available
            inputs = inputs.to(device)
            labels = labels.to(device)

            # Forward pass and prediction
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            val_loss += loss.item() * inputs.size(0)
            _, predicted = torch.max(outputs.data, 1)

            # Update total and correct counts
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    # Compute the average
    val_loss /= len(val_dataset)
    val_acc = 100 * correct / total

    # Print statistics
    print(f'\nEpoch {epoch+1} finished! Training Loss: {train_loss:.4f}, Training Accuracy: {train_acc:.2f}%, Validation Loss: {val_loss:.4f}, Validation Accuracy: {val_acc:.2f}%')


Here, we loop over the training data for the specified number of epochs and perform forward and backward passes to update the model parameters. For each batch of data, we move the inputs and labels to the GPU if available, compute the loss, and perform backward propagation and optimization. We also print the running loss every 100 batches to track the progress of the training.  
After each epoch, we compute the accuracy of the model on the validation set. We move the inputs and labels to the GPU if available, perform a forward pass to get the model's predictions, and compute the number of correctly predicted samples. We then print the validation accuracy.

# Step 8: Testing the Model

Once we have trained our model, we can test it on the test set to see how well it performs on unseen data:

In [None]:
# Evaluate the model on the test set
correct = 0
total = 0
with torch.no_grad():
    for inputs, labels in tqdm(test_loader, desc="Performing inference", total=len(test_dataset)//batch_size):
        # Move the inputs and labels to the GPU if available
        inputs = inputs.to(device)
        labels = labels.to(device)

        # Forward pass and prediction
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)

        # Update total and correct counts
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

# Print test accuracy
print('Test Accuracy: %d %%' % (100 * correct / total))

Here, we loop over the test set and perform a forward pass to get the model's predictions. We then compute the number of correctly predicted samples and print the test accuracy.

# Step 9: Computing the Confusion Matrix

A confusion matrix is a table that shows the performance of a classification model on a set of test data for which the true values are known. It can help us understand which classes our model is performing well on and which classes it is struggling with. Here's how we can compute and plot the confusion matrix for our dog breed classification model:

In [None]:
from sklearn.metrics import confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Get the true labels for the test set
y_true, y_pred = [], []
with torch.no_grad():
    for inputs, labels in test_loader:
        # Append the true labels to the list
        y_true.extend(labels.numpy())
        # Forward pass and prediction
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        y_pred.extend(predicted)

# Compute the confusion matrix
cm = confusion_matrix(y_true, y_pred)

# Plot the confusion matrix
plt.figure(figsize=(10, 10))
sns.heatmap(cm, annot=True, cmap="Blues", xticklabels=class_names, yticklabels=class_names)
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.show()

Here, we first loop over the test set and get the predicted labels for each sample. We then get the true labels for the test set. Using these two sets of labels, we can compute the confusion matrix using the confusion_matrix function from the scikit-learn library. Finally, we plot the confusion matrix using a heatmap with the seaborn library.

# Step 10: Plotting Randomly Selected Images

To get a visual understanding of how well our model is performing, we can plot a few randomly selected images from the test set along with their true and predicted labels. Here's how we can do that:

In [None]:
import random

# Select 10 random images from the test set
indices = random.sample(range(len(test_dataset)), 10)

# Plot the images
fig, axes = plt.subplots(nrows=2, ncols=5, figsize=(15, 6))
for i, idx in enumerate(indices):
    ax = axes.flat[i]
    img, label = test_dataset[idx]
    img = img.permute(1, 2, 0)
    ax.imshow(img)
    ax.set_title(f"{class_idx[label].split('-')[1]}\nPred: {class_names[y_pred[idx]]}")
    ax.axis('off')
plt.show()


Here, we first select 10 random indices from the test set using the random.sample function. We then loop over these indices and plot the corresponding images along with their true and predicted labels. We use subplots to plot the images in a grid of 2 rows and 5 columns.

# Conclusion

In this tutorial, we have shown how to train a PyTorch model to classify dog breeds using a pre-trained ResNet-18 model as the feature extractor. We have covered all the necessary steps, including loading and preprocessing the data, defining the model architecture, defining the loss function and optimizer, training the model, and testing the model on the test set. With this tutorial, one should be able to train a PyTorch model for their own classification problem.

# The End