<font color= #0cc754> <h1 align = "center">Digit Recognition with Deep Neural Networks</h1>


## <font color= #0cc754> Table of CONTENTS </font>

1. [Introduction](#1)
2. [Library Imports](#2)
3. [Reading and Preparing the Data](#3)
4. [The Model](#4)
    1. [Training the Model](#4.1)
    2. [Evaluating the Model](#4.2)
5. [Hyperparameter tuning the Model](#5)
6. [Convolutional Neural Network](#6-convolutional-neural-network)

<a id="1"></a>
## <font color= #0cc754> 1. Introduction </font>


This notebook is an organized way for Project 3: Digit recognition (Part 2) of the Course I took online in 2024 from MITx called `Machine Learning with Python-From Linear Models to Deep Learning`. This notebook isn't a solution, but instead based on that project.

Our goal is to implement a neural network to classify MNIST digits, a rather famous database of handwritten digits. We are going to use the PyTorch library for this purpose. We are going to Hyperparameter tune it using Bayesian Optimization. At the end we are going to use a Convolutional Neural Network (CNN) to do the same task.

<a id="2"></a>
## <font color= #0cc754> 2. Library Imports </font>


In [26]:
import pickle, gzip, numpy as np
from sklearn.model_selection import train_test_split
import torch
from torch.utils.data import DataLoader, TensorDataset
import torch.optim as optim
import torch.nn as nn
from hyperopt import fmin, tpe, hp, Trials, STATUS_OK
import torch.nn.functional as F
from tqdm import tqdm

<a id="3"></a>
## <font color= #0cc754> 3. Reading and Preparing the Data </font>


 We need to load the data from the file mnist_data.pkl.gz and split it into training,  validation and test sets.

In [27]:
with gzip.open('./data/mnist_data.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

X_train, y_train = train_set
X_valid, y_valid = valid_set
X_test, y_test = test_set
X_train=np.vstack((X_train,X_valid))
y_train=np.hstack((y_train,y_valid))


To release memory and split the train dataset we do

In [28]:
del train_set, valid_set, test_set, X_valid, y_valid

X_train, X_dev, y_train, y_dev = train_test_split(X_train, y_train, test_size=0.2, random_state=42, 
                                                 shuffle=True)


In [29]:
X_train.shape

(48000, 784)

Next we convert our arrays into Pytorch tensors and batchify it for better efficience in the training process of our Neural Network

In [30]:
train_x_tensor = torch.tensor(X_train, dtype=torch.float32)
train_y_tensor = torch.tensor(y_train, dtype=torch.long)
dev_x_tensor = torch.tensor(X_dev, dtype=torch.float32)
dev_y_tensor = torch.tensor(y_dev, dtype=torch.long)
test_x_tensor = torch.tensor(X_test, dtype=torch.float32)
test_y_tensor = torch.tensor(y_test, dtype=torch.long)


train_dataset = TensorDataset(train_x_tensor, train_y_tensor)
dev_dataset = TensorDataset(dev_x_tensor, dev_y_tensor)
test_dataset = TensorDataset(test_x_tensor, test_y_tensor)


batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

<a id="4"></a>
## <font color= #0cc754> 4. The Model </font>


We define a simple Neural Network, so we have a reasonably fast training time. The output should be of size 10, since we are trying to predict 10 classes (the digits 0-9)

In [31]:
model = nn.Sequential(
            nn.Linear(784, 10),
            nn.ReLU(),
            nn.Linear(10, 10),
        )
lr=0.1
momentum=0

<a id="4.1"></a>
### <font color= #0cc754> 4.1 Training the Model </font>


Now we train the Neural Network ( the model), computing the test accuracy and Loss. We, as well, keep track of the validation set accuracy and Loss.

In [32]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=lr, momentum=momentum)

def train_one_epoch(model, train_loader, criterion, optimizer):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0
    for inputs, labels in train_loader:
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        # Calculate accuracy of training
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        total_predictions += labels.size(0)
    return running_loss / len(train_loader), correct_predictions / total_predictions


In [33]:

def evaluate_one_epoch(model, dev_loader, criterion):
        model.eval()
        running_loss = 0.0
        correct_predictions = 0
        total_predictions = 0
        with torch.no_grad():
            for inputs, labels in dev_loader:
                outputs = model(inputs)
                loss = criterion(outputs, labels)
                running_loss += loss.item()

                # Calculate accuracy of validation
                _, predicted = torch.max(outputs, 1)
                correct_predictions += (predicted == labels).sum().item()
                total_predictions += labels.size(0)
        return running_loss / len(dev_loader), correct_predictions / total_predictions


In [34]:

num_epochs = 10
for epoch in tqdm(range(num_epochs)):
    train_loss = train_one_epoch(model, train_loader, criterion, optimizer)
    val_loss = evaluate_one_epoch(model, dev_loader, criterion)
    print(f"Training Loss: {train_loss[0]:.4f}, Training Accuracy: {train_loss[1]:.3f}"
          + f" Validation Loss: {val_loss[0]:.4f}, Validation Accuracy: {val_loss[1]:.3f}" )

# Let's save the model    
torch.save(model.state_dict(), 'model.pth')


 10%|█         | 1/10 [00:01<00:12,  1.42s/it]

Training Loss: 0.4765, Training Accuracy: 0.860 Validation Loss: 0.3157, Validation Accuracy: 0.911


 20%|██        | 2/10 [00:02<00:11,  1.45s/it]

Training Loss: 0.3133, Training Accuracy: 0.910 Validation Loss: 0.3239, Validation Accuracy: 0.904


 30%|███       | 3/10 [00:04<00:10,  1.44s/it]

Training Loss: 0.2885, Training Accuracy: 0.916 Validation Loss: 0.2706, Validation Accuracy: 0.923


 40%|████      | 4/10 [00:05<00:08,  1.46s/it]

Training Loss: 0.2713, Training Accuracy: 0.921 Validation Loss: 0.2711, Validation Accuracy: 0.923


 50%|█████     | 5/10 [00:07<00:07,  1.46s/it]

Training Loss: 0.2611, Training Accuracy: 0.925 Validation Loss: 0.2781, Validation Accuracy: 0.920


 60%|██████    | 6/10 [00:08<00:05,  1.46s/it]

Training Loss: 0.2527, Training Accuracy: 0.926 Validation Loss: 0.2689, Validation Accuracy: 0.922


 70%|███████   | 7/10 [00:10<00:04,  1.46s/it]

Training Loss: 0.2393, Training Accuracy: 0.931 Validation Loss: 0.2424, Validation Accuracy: 0.930


 80%|████████  | 8/10 [00:11<00:02,  1.45s/it]

Training Loss: 0.2281, Training Accuracy: 0.935 Validation Loss: 0.2443, Validation Accuracy: 0.929


 90%|█████████ | 9/10 [00:13<00:01,  1.45s/it]

Training Loss: 0.2202, Training Accuracy: 0.936 Validation Loss: 0.2659, Validation Accuracy: 0.924


100%|██████████| 10/10 [00:14<00:00,  1.46s/it]

Training Loss: 0.2163, Training Accuracy: 0.937 Validation Loss: 0.2399, Validation Accuracy: 0.930





<a id="4.2"></a>
### <font color= #0cc754> 4.2 Evaluating the Model </font>


Next, we need to evaluate our trained model in the test set. Calculating the Loss and the Accuracy of our trained model.

In [35]:
test_loss = evaluate_one_epoch(model, test_loader, criterion)
print(f"Test Loss: {test_loss[0]:.4f}, Test Accuracy: {test_loss[1]:.3f}")

Test Loss: 0.2371, Test Accuracy: 0.931


<a id="5"></a>
## <font color= #0cc754> 5. Hyperparameter tuning the Model </font>


To be fair a Deep Learning model has MANY hyperparameters we can tune. The number of hidden layers, the number of neurons in each hidden layer, which optimizer we can use, learning rate, momentum, the architeture itself of our model can be drastically changed and so and so on. We are going to optimize by only changing the number of hidden units in our hidden layer, the momentum and the learning rate. 

In [36]:
def objective(params):
    model = nn.Sequential(
        nn.Linear(784, int(params['hidden_units'])),
        nn.ReLU(),
        nn.Linear(int(params['hidden_units']), 10),
    )
    
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.SGD(model.parameters(), lr=params['lr'], momentum=params['momentum'])
    
    num_epochs = 5
    for epoch in range(num_epochs):
        train_loss, train_acc = train_one_epoch(model, train_loader, criterion, optimizer)
        val_loss, val_acc = evaluate_one_epoch(model, dev_loader, criterion)
    # We want to maximize the validation accuracy, so we return the negative of it 
    # because the hyperparameter optimizer minimizes the objective.
    return {'loss': -val_acc, 'status': STATUS_OK}


In [37]:

# Define the search space
space = {
    'hidden_units': hp.quniform('hidden_units', 50, 200, 10),
    'lr': hp.loguniform('lr', -5, -1),
    'momentum': hp.uniform('momentum', 0, 1)
}


In [38]:

# Run the optimization
trials = Trials()
best = fmin(fn=objective, space=space, algo=tpe.suggest, max_evals=50, trials=trials)

print("Best hyperparameters found: ", best)

100%|██████████| 50/50 [07:09<00:00,  8.59s/trial, best loss: -0.97725]           
Best hyperparameters found:  {'hidden_units': 200.0, 'lr': 0.14983990022656565, 'momentum': 0.6671579815534182}


So, with the hyperparameter above we have an accuracy of 0.977 in the validation set. We remember we were getting an accuracy of approximately 0.93 before. 

<a id="6"></a>
## <font color= #0cc754> 6. Convolutional Neural Network </font>


Now we are going to use Convolutional Neural Networks (CNNs) to do the same task. These networks have demonstrated great performance on many deep learning tasks, especially in computer vision. Therefore it is expected that this one will do a better job in recognizing the digits when compared to our previous Neural Network. 

We begin by importing the data again (we will need to reshape the data this time)

In [39]:
with gzip.open('./data/mnist_data.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

X_train, y_train = train_set
X_valid, y_valid = valid_set
X_test, y_test = test_set
X_train=np.vstack((X_train,X_valid))
y_train=np.hstack((y_train,y_valid))


In [40]:
X_train.shape

(60000, 784)

In [41]:
 # We need to reshape the data into a 1x28x28 image (1 is due to being in greyscale), 
 # since that's what our Convolutional Neural Network expects.

X_train = np.reshape(X_train, (X_train.shape[0], 1, 28, 28))
X_test = np.reshape(X_test, (X_test.shape[0], 1, 28, 28))

In [42]:
X_train.shape

(60000, 1, 28, 28)

We split our train data into dev (development or validation) and training data. Same as before we do a treatment making them into PyTorch tensors and batching them.

In [43]:
del train_set, valid_set, test_set, X_valid, y_valid

X_train, X_dev, y_train, y_dev = train_test_split(X_train, y_train, test_size=0.2, random_state=42, 
                                                 shuffle=True)


In [44]:
train_x_tensor = torch.tensor(X_train, dtype=torch.float32)
train_y_tensor = torch.tensor(y_train, dtype=torch.long)
dev_x_tensor = torch.tensor(X_dev, dtype=torch.float32)
dev_y_tensor = torch.tensor(y_dev, dtype=torch.long)
test_x_tensor = torch.tensor(X_test, dtype=torch.float32)
test_y_tensor = torch.tensor(y_test, dtype=torch.long)


train_dataset = TensorDataset(train_x_tensor, train_y_tensor)
dev_dataset = TensorDataset(dev_x_tensor, dev_y_tensor)
test_dataset = TensorDataset(test_x_tensor, test_y_tensor)


batch_size = 32
train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)
dev_loader = DataLoader(dev_dataset, batch_size=batch_size, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)

Now we define the model. What we want is:
* A convolutional layer with 32 filters of size 3x3

* A ReLU nonlinearity

* A max pooling layer with size 2x2

* A convolutional layer with 64 filters of size 3x3

* A ReLU nonlinearity

* A max pooling layer with size 2x2

* A flatten layer

* A fully connected layer with 128 neurons

* A dropout layer with drop probability 0.5

* A fully-connected layer with 10 neurons

In [45]:
model = nn.Sequential(
          nn.Conv2d(1, 32, (3, 3)),
          nn.ReLU(),
          nn.MaxPool2d((2, 2)),
          nn.Conv2d(32, 64, (3, 3)),
          nn.ReLU(),
          nn.MaxPool2d((2, 2)),
          nn.Flatten(),
          nn.Linear(1600, 128),
          nn.Dropout(0.5),
          nn.Linear(128, 10),
        )

Training this kind of NN can take a while. My GPU is compatible with CUDA, which allows the training to use the GPU. I won't be using it here, but you can (after installing Pytorch with CUDA) see if it is available using 

In [46]:
torch.cuda.is_available()

True

I will be using the CPU for this example, but you can change it to use the GPU by changing the device variable to torch.device("cuda").

In [47]:
device = torch.device("cpu")

In [49]:
# Move the model to the device
model.to(device)
# We need to update the train_one_epoch and evaluate_one_epoch functions to move data to the device
def train_one_epoch(model, train_loader, criterion, optimizer, device):
    model.train()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0
    for inputs, labels in train_loader:
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        running_loss += loss.item()

        # Calculate accuracy of training
        _, predicted = torch.max(outputs, 1)
        correct_predictions += (predicted == labels).sum().item()
        total_predictions += labels.size(0)
    return running_loss / len(train_loader), correct_predictions / total_predictions

def evaluate_one_epoch(model, dev_loader, criterion, device):
    model.eval()
    running_loss = 0.0
    correct_predictions = 0
    total_predictions = 0
    with torch.no_grad():
        for inputs, labels in dev_loader:
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            loss = criterion(outputs, labels)
            running_loss += loss.item()

            # Calculate accuracy of validation
            _, predicted = torch.max(outputs, 1)
            correct_predictions += (predicted == labels).sum().item()
            total_predictions += labels.size(0)
    return running_loss / len(dev_loader), correct_predictions / total_predictions

# Training loop with device
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0)
num_epochs = 10
for epoch in tqdm(range(num_epochs)):
    train_loss = train_one_epoch(model, train_loader, criterion, optimizer, device)
    val_loss = evaluate_one_epoch(model, dev_loader, criterion, device)
    print(f"Training Loss: {train_loss[0]:.4f}, Training Accuracy: {train_loss[1]:.3f}"
          + f" Validation Loss: {val_loss[0]:.4f}, Validation Accuracy: {val_loss[1]:.3f}")

# Save the model
#torch.save(model.state_dict(), 'model.pth')

 10%|█         | 1/10 [00:13<01:59, 13.26s/it]

Training Loss: 0.0212, Training Accuracy: 0.993 Validation Loss: 0.0354, Validation Accuracy: 0.990


 20%|██        | 2/10 [00:25<01:41, 12.74s/it]

Training Loss: 0.0190, Training Accuracy: 0.994 Validation Loss: 0.0406, Validation Accuracy: 0.988


 30%|███       | 3/10 [00:38<01:28, 12.58s/it]

Training Loss: 0.0174, Training Accuracy: 0.994 Validation Loss: 0.0386, Validation Accuracy: 0.990


 40%|████      | 4/10 [00:50<01:15, 12.60s/it]

Training Loss: 0.0171, Training Accuracy: 0.994 Validation Loss: 0.0358, Validation Accuracy: 0.991


 50%|█████     | 5/10 [01:02<01:02, 12.49s/it]

Training Loss: 0.0139, Training Accuracy: 0.996 Validation Loss: 0.0476, Validation Accuracy: 0.989


 60%|██████    | 6/10 [01:15<00:49, 12.50s/it]

Training Loss: 0.0141, Training Accuracy: 0.995 Validation Loss: 0.0402, Validation Accuracy: 0.990


 70%|███████   | 7/10 [01:27<00:37, 12.47s/it]

Training Loss: 0.0118, Training Accuracy: 0.996 Validation Loss: 0.0429, Validation Accuracy: 0.989


 80%|████████  | 8/10 [01:40<00:24, 12.45s/it]

Training Loss: 0.0114, Training Accuracy: 0.996 Validation Loss: 0.0517, Validation Accuracy: 0.989


 90%|█████████ | 9/10 [01:52<00:12, 12.47s/it]

Training Loss: 0.0118, Training Accuracy: 0.996 Validation Loss: 0.0413, Validation Accuracy: 0.991


100%|██████████| 10/10 [02:05<00:00, 12.53s/it]

Training Loss: 0.0115, Training Accuracy: 0.996 Validation Loss: 0.0397, Validation Accuracy: 0.991





So, we see here that even without any hyperparameter we get an accuracy > 0.99, compared with the other neural network with 0.93 and 0.977 (with Hyperparameter Tuning).