<a href="https://colab.research.google.com/github/rhiosutoyo/Teaching-Deep-Learning-and-Its-Applications/blob/main/10_1_resnet_model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ResNet Model
This document presents a PyTorch-based implementation for training and testing a simple computer vision model utilizing the ResNet-18 architecture on the CIFAR-10 dataset. The process includes data preprocessing, model training, evaluation, and prediction on individual images sourced from an external URL.

1. **Data Preprocessing**: The CIFAR-10 dataset, comprising 60,000 32x32 color images in 10 classes, is loaded and preprocessed. Training data undergoes random cropping, horizontal flipping, and normalization, while test data is only normalized.
2. **Model Architecture**: A pre-trained ResNet-18 model is employed, and its final fully connected layer is modified to output 10 classes corresponding to the CIFAR-10 dataset categories.
3. **Training**: The model is trained using the Stochastic Gradient Descent (SGD) optimizer with a learning rate of 0.001 and momentum of 0.9. The training loop iterates over the dataset for 25 epochs, updating model weights based on the cross-entropy loss.
4. **Evaluation**: The trained model is evaluated on the test set, reporting the accuracy over 10,000 test images.
5. **Prediction on External Image**: The code includes functionality to predict the class of an external image. The image is downloaded from a URL, preprocessed to match the training data format, and passed through the trained model to predict its class.

This implementation demonstrates the practical use of ResNet-18 for a classification task and provides a comprehensive framework for training, testing, and deploying a deep learning model on real-world data.

In [1]:
!pip install torch torchvision

Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.1.105 (from torch)
  Using cached nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==8.9.2.26 (from torch)
  Using cached nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
  Using cached nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
  Using cached nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
  Using cached nvidia_curand_cu12-10.3.2.106-py3-

In [2]:
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
import torchvision.models as models

# Define transformations for the training set, flip the images randomly, crop them and normalize
transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# Define transformations for the test set
transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

# Download and load the training data
trainset = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=128, shuffle=True, num_workers=2)

# Download and load the test data
testset = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = torch.utils.data.DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

# Define the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Load the pre-trained ResNet-18 model
model = models.resnet18(pretrained=True)

# Modify the final fully connected layer to output 10 classes (for CIFAR-10)
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 10)

# Move the model to the device (GPU/CPU)
model = model.to(device)

# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# Training the model
def train_model(model, criterion, optimizer, num_epochs=25):
    for epoch in range(num_epochs):  # loop over the dataset multiple times
        model.train()
        running_loss = 0.0
        for inputs, labels in trainloader:
            inputs, labels = inputs.to(device), labels.to(device)

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward + backward + optimize
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {running_loss / len(trainloader):.4f}')

    print('Finished Training')

# Test the model
def test_model(model):
    model.eval()
    correct = 0
    total = 0
    with torch.no_grad():
        for inputs, labels in testloader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()

    print(f'Accuracy of the network on the 10000 test images: {100 * correct / total} %')

# Train and test the model
train_model(model, criterion, optimizer, num_epochs=25)
test_model(model)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz


100%|██████████| 170498071/170498071 [00:08<00:00, 20444094.29it/s]


Extracting ./data/cifar-10-python.tar.gz to ./data
Files already downloaded and verified


Downloading: "https://download.pytorch.org/models/resnet18-f37072fd.pth" to /root/.cache/torch/hub/checkpoints/resnet18-f37072fd.pth
100%|██████████| 44.7M/44.7M [00:00<00:00, 129MB/s]


Epoch [1/25], Loss: 1.3116
Epoch [2/25], Loss: 0.8921
Epoch [3/25], Loss: 0.7764
Epoch [4/25], Loss: 0.7003
Epoch [5/25], Loss: 0.6450
Epoch [6/25], Loss: 0.6022
Epoch [7/25], Loss: 0.5718
Epoch [8/25], Loss: 0.5425
Epoch [9/25], Loss: 0.5147
Epoch [10/25], Loss: 0.4892
Epoch [11/25], Loss: 0.4653
Epoch [12/25], Loss: 0.4504
Epoch [13/25], Loss: 0.4279
Epoch [14/25], Loss: 0.4112
Epoch [15/25], Loss: 0.3938
Epoch [16/25], Loss: 0.3805
Epoch [17/25], Loss: 0.3673
Epoch [18/25], Loss: 0.3552
Epoch [19/25], Loss: 0.3387
Epoch [20/25], Loss: 0.3272
Epoch [21/25], Loss: 0.3148
Epoch [22/25], Loss: 0.3026
Epoch [23/25], Loss: 0.2924
Epoch [24/25], Loss: 0.2823
Epoch [25/25], Loss: 0.2734
Finished Training
Accuracy of the network on the 10000 test images: 83.84 %


In [19]:
from PIL import Image
import numpy as np
import requests
from io import BytesIO

# Function to predict the class of a single image
def predict_image(image_url, model):
    model.eval()
    # Download and preprocess the image
    response = requests.get(image_url)
    image = Image.open(BytesIO(response.content))
    transform = transforms.Compose([
        transforms.Resize((32, 32)),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
    ])
    image = transform(image).unsqueeze(0).to(device)

    # Predict the class
    with torch.no_grad():
        outputs = model(image)
        _, predicted = torch.max(outputs.data, 1)

    # Get the class names
    classes = ('plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
    return classes[predicted.item()]

In [20]:
# Example usage
image_url = 'https://github.com/rhiosutoyo/Teaching-Deep-Learning-and-Its-Applications/blob/main/images/10-1-dog.jpg?raw=true'  # Image URL
predicted_class = predict_image(image_url, model)
print(f'The predicted class for the input image is: {predicted_class}')

image_url = 'https://github.com/rhiosutoyo/Teaching-Deep-Learning-and-Its-Applications/blob/main/images/10-1-frog.jpg?raw=true'  # Image URL
predicted_class = predict_image(image_url, model)
print(f'The predicted class for the input image is: {predicted_class}')

image_url = 'https://github.com/rhiosutoyo/Teaching-Deep-Learning-and-Its-Applications/blob/main/images/10-1-plane.jpg?raw=true'  # Image URL
predicted_class = predict_image(image_url, model)
print(f'The predicted class for the input image is: {predicted_class}')

The predicted class for the input image is: dog
The predicted class for the input image is: frog
The predicted class for the input image is: plane


# Suggested Activities
1. Please run the code and adjust the number of epochs, learning rate, and other hyperparameters as needed.
2. Please use your own example for testing the model.