<a href="https://colab.research.google.com/github/msa-1988/RNN_GRU_LSTM_MNIST_Classification/blob/main/RNN_GRU_LSTM_MNIST_Classification.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

The MNIST dataset is a widely used benchmark dataset in the field of machine learning and computer vision. It consists of a large collection of 28x28 grayscale images of handwritten digits from 0 to 9, along with their corresponding labels. The dataset is divided into two main parts: a training set containing 60,000 images and a test set containing 10,000 images.

The motivation behind using the MNIST dataset for classification tasks is twofold. Firstly, the dataset serves as a fundamental introduction to image classification problems, allowing researchers and practitioners to develop and evaluate their models on a relatively simple and well-understood task. Secondly, the dataset's simplicity and small size make it ideal for prototyping and benchmarking different machine learning algorithms and techniques.

The classification problem associated with the MNIST dataset is to build a model that can accurately classify the handwritten digits into their respective classes (0 to 9). This problem is a classic example of multi-class classification, where the goal is to assign a single label to each input image based on its visual features.

In this notebook, I developed three RNN models to solve this problem: RNN, GRU, and LSTM.

First step is to import the required libraries;
Check if GPU device is available

In [11]:
import torch
import torch.nn as nn
import torchvision
import torchvision.transforms as transforms

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


Set the hyperparametes

In [12]:

# Hyper-parameters
# input_size = 784 # 28x28
num_classes = 10
num_epochs = 10
batch_size = 100
learning_rate = 0.001

input_size = 28
sequence_length = 28
hidden_size = 128
num_layers = 2


Get the MNIST data for traning and testing from the torchvision.dataset

In [13]:

# MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data',
                                           train=True,
                                           transform=transforms.ToTensor(),
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='./data',
                                          train=False,
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


Below is the RNN model I developed using nn.RNN module in torch

In [14]:
# Fully connected neural network with one hidden layer
class RNN(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(RNN, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        # -> x needs to be: (batch_size, seq, input_size)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial hidden states (and cell states for LSTM)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        # x: (n, 28, 28), h0: (2, n, 128)

        # Forward propagate RNN
        out, _ = self.rnn(x, h0)

        # out: tensor of shape (batch_size, seq_length, hidden_size)
        # out: (n, 28, 128)

        # Decode the hidden state of the last time step
        out = out[:, -1, :]
        # out: (n, 128)

        out = self.fc(out)
        # out: (n, 10)
        return out


Below is the GRU model I developed using nn.GRU module in torch

In [15]:
# Fully connected neural network with one hidden layer
class GRU(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(GRU, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        # self.rnn = nn.RNN(input_size, hidden_size, num_layers, batch_first=True)
        # -> x needs to be: (batch_size, seq, input_size)

        # or:
        self.gru = nn.GRU(input_size, hidden_size, num_layers, batch_first=True)
        # self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial hidden states (and cell states for LSTM)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        # c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        # x: (n, 28, 28), h0: (2, n, 128)

        # Forward propagate RNN
        # out, _ = self.rnn(x, h0)
        # or:
        out, _ = self.gru(x, h0)
        # out, _ = self.lstm(x, (h0,c0))

        # out: tensor of shape (batch_size, seq_length, hidden_size)
        # out: (n, 28, 128)

        # Decode the hidden state of the last time step
        out = out[:, -1, :]
        # out: (n, 128)

        out = self.fc(out)
        # out: (n, 10)
        return out


Below is the LSTM model I developed using nn.LSTM module in torch

In [16]:
# Fully connected neural network with one hidden layer
class LSTM(nn.Module):
    def __init__(self, input_size, hidden_size, num_layers, num_classes):
        super(LSTM, self).__init__()
        self.num_layers = num_layers
        self.hidden_size = hidden_size
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
        self.fc = nn.Linear(hidden_size, num_classes)

    def forward(self, x):
        # Set initial hidden states (and cell states for LSTM)
        h0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)
        c0 = torch.zeros(self.num_layers, x.size(0), self.hidden_size).to(device)

        # x: (n, 28, 28), h0: (2, n, 128)

        # Forward propagate RNN

        out, _ = self.lstm(x, (h0,c0))

        # out: tensor of shape (batch_size, seq_length, hidden_size)
        # out: (n, 28, 128)

        # Decode the hidden state of the last time step
        out = out[:, -1, :]
        # out: (n, 128)

        out = self.fc(out)
        # out: (n, 10)
        return out


In the following, I initialize the models, train them, and evaluate their accuracy.

In [17]:

model_rnn = RNN(input_size, hidden_size, num_layers, num_classes).to(device)
model_gru = GRU(input_size, hidden_size, num_layers, num_classes).to(device)
model_lstm = LSTM(input_size, hidden_size, num_layers, num_classes).to(device)


# Train the model
n_total_steps = len(train_loader)
for model in [model_rnn,model_gru,model_lstm]:
  print(f"Start training {model}\n")
  # Loss and optimizer
  criterion = nn.CrossEntropyLoss()
  optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

  for epoch in range(num_epochs):
      for i, (images, labels) in enumerate(train_loader):
          # origin shape: [N, 1, 28, 28]
          # resized: [N, 28, 28]
          images = images.reshape(-1, sequence_length, input_size).to(device)
          labels = labels.to(device)

          # Forward pass
          outputs = model(images)
          loss = criterion(outputs, labels)

          # Backward and optimize
          optimizer.zero_grad()
          loss.backward()
          optimizer.step()

          if (i+1) % 100 == 0:
              print (f'Epoch [{epoch+1}/{num_epochs}], Step [{i+1}/{n_total_steps}], Loss: {loss.item():.4f}')

  # Test the model
  # In test phase, we don't need to compute gradients (for memory efficiency)
  # print(f"Start evaluating {model}")
  with torch.no_grad():
      n_correct = 0
      n_samples = 0
      for images, labels in test_loader:
          images = images.reshape(-1, sequence_length, input_size).to(device)
          labels = labels.to(device)
          outputs = model(images)
          # max returns (value ,index)
          _, predicted = torch.max(outputs.data, 1)
          n_samples += labels.size(0)
          n_correct += (predicted == labels).sum().item()

      acc = 100.0 * n_correct / n_samples
      print(f'Accuracy of the {model} on the 10000 test images: {acc} % \n')

Start training RNN(
  (rnn): RNN(28, 128, num_layers=2, batch_first=True)
  (fc): Linear(in_features=128, out_features=10, bias=True)
)

Epoch [1/10], Step [100/600], Loss: 1.0881
Epoch [1/10], Step [200/600], Loss: 0.7272
Epoch [1/10], Step [300/600], Loss: 0.5096
Epoch [1/10], Step [400/600], Loss: 0.4605
Epoch [1/10], Step [500/600], Loss: 0.2920
Epoch [1/10], Step [600/600], Loss: 0.4853
Epoch [2/10], Step [100/600], Loss: 0.3459
Epoch [2/10], Step [200/600], Loss: 0.4100
Epoch [2/10], Step [300/600], Loss: 0.3591
Epoch [2/10], Step [400/600], Loss: 0.3898
Epoch [2/10], Step [500/600], Loss: 0.1693
Epoch [2/10], Step [600/600], Loss: 0.3356
Epoch [3/10], Step [100/600], Loss: 0.1605
Epoch [3/10], Step [200/600], Loss: 0.1760
Epoch [3/10], Step [300/600], Loss: 0.1672
Epoch [3/10], Step [400/600], Loss: 0.1899
Epoch [3/10], Step [500/600], Loss: 0.1691
Epoch [3/10], Step [600/600], Loss: 0.3268
Epoch [4/10], Step [100/600], Loss: 0.1995
Epoch [4/10], Step [200/600], Loss: 0.1237
Epo