**Challenge: Implement a Multiclass Classification Neural Network using PyTorch**

Objective:
Build a neural network using PyTorch to predict handwritten digits of MNIST.

Steps:

1. **Data Preparation**: Load the MNIST dataset using ```torchvision.datasets.MNIST```. Standardize/normalize the features. Split the dataset into training and testing sets using, for example, ```sklearn.model_selection.train_test_split()```. **Bonus scores**: *use PyTorch's built-* ```DataLoader``` *to split the dataset*.

2. **Neural Network Architecture**: Define a simple feedforward neural network using PyTorch's ```nn.Module```. Design the input layer to match the number of features in the MNIST dataset and the output layer to have as many neurons as there are classes (10). You can experiment with the number of hidden layers and neurons to optimize the performance. **Bonus scores**: *Make your architecture flexibile to have as many hidden layers as the user wants, and use hyperparameter optimization to select the best number of hidden layeres.*

3. **Loss Function and Optimizer**: Choose an appropriate loss function for multiclass classification. Select an optimizer, like SGD (Stochastic Gradient Descent) or Adam.

4. **Training**: Write a training loop to iterate over the dataset.
Forward pass the input through the network, calculate the loss, and perform backpropagation. Update the weights of the network using the chosen optimizer.

5. **Testing**: Evaluate the trained model on the test set. Calculate the accuracy of the model.

6. **Optimization**: Experiment with hyperparameters (learning rate, number of epochs, etc.) to optimize the model's performance. Consider adjusting the neural network architecture for better results. **Notice that you can't use the optimization algorithms from scikit-learn that we saw in lab1: e.g.,** ```GridSearchCV```.


# **FIRST POINT**

In [None]:
import torch.nn as nn
import torch.optim as optim
import torchvision
import numpy as np
import torch
from torch.utils.data import DataLoader
from torchvision import transforms

In [None]:
change = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.8,))])

In [None]:
train_set = torchvision.datasets.MNIST(root='./data', train=True, download= True, transform=change)
test_set = torchvision.datasets.MNIST(root='./data', train=False, download= True, transform=change)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 117473387.03it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 45437244.50it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 27006128.40it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 20550732.22it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [None]:
print(f'train {train_set}')
print(f'test {test_set}')

train Dataset MNIST
    Number of datapoints: 60000
    Root location: ./data
    Split: Train
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.8,))
           )
test Dataset MNIST
    Number of datapoints: 10000
    Root location: ./data
    Split: Test
    StandardTransform
Transform: Compose(
               ToTensor()
               Normalize(mean=(0.5,), std=(0.8,))
           )


In [None]:
batch_size = 32

In [None]:
custom_trainloader = DataLoader(train_set, batch_size=batch_size, shuffle=True)
custom_testloader = DataLoader(test_set, batch_size=batch_size, shuffle=True)

# **SECOND POINT**

In [None]:
class MNISTConvNet(nn.Module):

    def __init__(self, hidden_layers):
        super().__init__()
        #First layer
        self.flatten1 = nn.Flatten()
        self.fc_layer1 = nn.Linear(784, 300)
        self.activation1 = nn.Sigmoid()
        #Hidden layer
        self.hidden_layers = nn.ModuleList([nn.Linear(300, 300) for _ in range(hidden_layers)])
        self.activation2 = nn.Sigmoid()
        # Last layer
        self.flatten2 = nn.Flatten()
        self.fc_layer2 = nn.Linear(300, 100)
        self.activation3 = nn.Sigmoid()
        self.fc_layer3 = nn.Linear(100, 10)

    def forward(self, x):
        # First Layer
        x = self.flatten1(x)
        x = self.fc_layer1(x)
        x = self.activation1(x)

        for i, layer in enumerate(self.hidden_layers):
            x = self.hidden_layers[i // 2](x) + layer(x)

        x = self.activation2(x)
        x = self.flatten2(x)
        x = self.fc_layer2(x)
        x = self.activation3(x)
        x = self.fc_layer3(x)

        return x

In [None]:
hidden_layers = 5

In [None]:
model = MNISTConvNet(hidden_layers)
model

MNISTConvNet(
  (flatten1): Flatten(start_dim=1, end_dim=-1)
  (fc_layer1): Linear(in_features=784, out_features=300, bias=True)
  (activation1): Sigmoid()
  (hidden_layers): ModuleList(
    (0-4): 5 x Linear(in_features=300, out_features=300, bias=True)
  )
  (activation2): Sigmoid()
  (flatten2): Flatten(start_dim=1, end_dim=-1)
  (fc_layer2): Linear(in_features=300, out_features=100, bias=True)
  (activation3): Sigmoid()
  (fc_layer3): Linear(in_features=100, out_features=10, bias=True)
)

In [None]:
class MNISTConvNet(nn.Module):
    def __init__(self, hidden_layers):
        super().__init__()

        self.first = nn.Sequential(
            nn.Flatten(),
            nn.Linear(784, 300),
            nn.Sigmoid()
        )

        self.list_modules = []
        for _ in range(hidden_layers):
            self.list_modules.append(nn.Sequential(
                nn.Linear(300, 300),
                nn.Sigmoid()
            ))

        self.last = nn.Sequential(
            nn.Flatten(),
            nn.Linear(300, 100),
            nn.Sigmoid(),
            nn.Linear(100, 10)
        )

    def forward(self, x):
        x = self.first(x)
        for i in range(len(self.list_modules)):
            x = self.list_modules[i](x)
        x = self.last(x)

        return x

In [None]:
model = MNISTConvNet(hidden_layers)
model

MNISTConvNet(
  (first): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=300, bias=True)
    (2): Sigmoid()
  )
  (last): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=300, out_features=100, bias=True)
    (2): Sigmoid()
    (3): Linear(in_features=100, out_features=10, bias=True)
  )
)

# **THIRD POINT**

In [None]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

# **FOUTH and FIFTH POINT**

In [None]:
n_epochs = 10

In [None]:
for current_epoch in range(n_epochs):
    epoch_losses = []


    for batch_inputs, batch_labels in custom_trainloader:   #Training
        predictions = model(batch_inputs)
        current_loss = criterion(predictions, batch_labels)

        epoch_losses.append(current_loss.item())

        optimizer.zero_grad()
        current_loss.backward()
        optimizer.step()

    print(f'Epoch {current_epoch + 1} --> loss = {np.mean(epoch_losses)}')

    correct_predictions = 0
    total_samples = 0
    for test_inputs, test_labels in custom_testloader:
        test_predictions = model(test_inputs)


        correct_predictions += (torch.argmax(test_predictions, 1).long() == test_labels).sum().item()
        total_samples += len(test_labels)

    if total_samples != 0:
        accuracy = correct_predictions / total_samples
        print(f'Epoch {current_epoch + 1} --> accuracy = {accuracy * 100:.2f}%')
    else:
        print(f'Epoch {current_epoch + 1} --> accuracy not available (total_samples is zero)')


Epoch 1 --> loss = 2.3027390914916994
Epoch 1 --> accuracy = 11.35%
Epoch 2 --> loss = 2.302634429295858
Epoch 2 --> accuracy = 11.35%
Epoch 3 --> loss = 2.302769232304891
Epoch 3 --> accuracy = 11.35%
Epoch 4 --> loss = 2.3025958488464355
Epoch 4 --> accuracy = 11.35%
Epoch 5 --> loss = 2.302762683232625
Epoch 5 --> accuracy = 11.35%
Epoch 6 --> loss = 2.302504721577962
Epoch 6 --> accuracy = 11.35%
Epoch 7 --> loss = 2.3025542709350586
Epoch 7 --> accuracy = 10.10%
Epoch 8 --> loss = 2.3023181499481202
Epoch 8 --> accuracy = 11.35%
Epoch 9 --> loss = 2.3024417259216308
Epoch 9 --> accuracy = 11.35%
Epoch 10 --> loss = 2.3023575803120933
Epoch 10 --> accuracy = 10.28%


# **SIXTH POINT**

In [None]:
!pip install optuna
import optuna

Collecting optuna
  Downloading optuna-3.5.0-py3-none-any.whl (413 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m413.4/413.4 kB[0m [31m2.6 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting alembic>=1.5.0 (from optuna)
  Downloading alembic-1.13.1-py3-none-any.whl (233 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m233.4/233.4 kB[0m [31m17.2 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting colorlog (from optuna)
  Downloading colorlog-6.8.0-py3-none-any.whl (11 kB)
Collecting Mako (from alembic>=1.5.0->optuna)
  Downloading Mako-1.3.0-py3-none-any.whl (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m10.5 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: Mako, colorlog, alembic, optuna
Successfully installed Mako-1.3.0 alembic-1.13.1 colorlog-6.8.0 optuna-3.5.0


In [None]:
def custom_objective(trial):

    num_hidden_layers = trial.suggest_int('num_hidden_layers', 5, 8)
    num_epochs = trial.suggest_int('num_epochs', 10, 15)

    custom_model = MNISTConvNet(num_hidden_layers)
    custom_optimizer = optim.SGD(custom_model.parameters(), lr=0.001, momentum=0.9)

    for current_epoch in range(num_epochs):

        total_losses = 0

        for custom_input, custom_label in custom_trainloader:  # Assuming custom_trainloader is your training data loader
            custom_y_pred = custom_model(custom_input)

            custom_loss = criterion(custom_y_pred, custom_label)

            total_losses += custom_loss.item()

            custom_optimizer.zero_grad()
            custom_loss.backward()
            custom_optimizer.step()

        result = total_losses / len(custom_trainloader)

    return result


In [None]:
custom_study = optuna.create_study()
custom_study.optimize(custom_objective, n_trials=5, n_jobs=-1)



[I 2023-12-21 21:07:27,996] A new study created in memory with name: no-name-2f1ff1b9-54f5-45a5-8eb9-fe247d5bb2a9
[I 2023-12-21 21:18:18,842] Trial 1 finished with value: 2.3028564802805582 and parameters: {'num_hidden_layers': 8, 'num_epochs': 12}. Best is trial 1 with value: 2.3028564802805582.
[I 2023-12-21 21:20:27,009] Trial 0 finished with value: 2.3026950992584228 and parameters: {'num_hidden_layers': 7, 'num_epochs': 15}. Best is trial 0 with value: 2.3026950992584228.
[I 2023-12-21 21:30:18,595] Trial 2 finished with value: 2.302688578923543 and parameters: {'num_hidden_layers': 8, 'num_epochs': 13}. Best is trial 2 with value: 2.302688578923543.
[I 2023-12-21 21:33:22,155] Trial 3 finished with value: 2.302760189565023 and parameters: {'num_hidden_layers': 8, 'num_epochs': 14}. Best is trial 2 with value: 2.302688578923543.
[I 2023-12-21 21:36:08,390] Trial 4 finished with value: 2.302731950759888 and parameters: {'num_hidden_layers': 6, 'num_epochs': 12}. Best is trial 2 wit

In [None]:
print(custom_study.best_params)

{'num_hidden_layers': 8, 'num_epochs': 13}


In [None]:
from optuna.visualization import plot_parallel_coordinate

plot_parallel_coordinate(custom_study)

Therefore, the optimal choice is the dark blue option, characterized by num_epochs = 13 and 8 num_hidden_layers.

In [None]:
best_model = MNISTConvNet(8)
best_model

MNISTConvNet(
  (first): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=784, out_features=300, bias=True)
    (2): Sigmoid()
  )
  (last): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=300, out_features=100, bias=True)
    (2): Sigmoid()
    (3): Linear(in_features=100, out_features=10, bias=True)
  )
)

And train again the model to evaluate with the best params.

In [None]:
n_epochs = 13

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(best_model.parameters(), lr=0.001, momentum=0.9) #SGD = Stochastic gradient descent

for epoch in range(n_epochs):

  losses = []
  #training
  for inputs, labels in custom_trainloader:

    pred = best_model(inputs)
    loss = criterion(pred, labels)

    losses.append(loss.item())

    optimizer.zero_grad()

    loss.backward()

    optimizer.step()

  print(f'Epoch {epoch + 1} --> loss = {np.mean(losses)}')

  acc = 0
  count = 0
  for inputs, labels in custom_testloader:

    pred = best_model(inputs)

    acc += (torch.argmax(pred,1) == labels).float().sum()
    count += len(labels)

  acc /= count

  print(f'Epoch {epoch + 1} --> model accuracy = {acc * 100}')

Epoch 1 --> loss = 2.303599220275879
Epoch 1 --> model accuracy = 10.100000381469727
Epoch 2 --> loss = 2.302980726114909
Epoch 2 --> model accuracy = 11.350000381469727
Epoch 3 --> loss = 2.3032508056640624
Epoch 3 --> model accuracy = 10.09000015258789
Epoch 4 --> loss = 2.3030198240915936
Epoch 4 --> model accuracy = 11.350000381469727
Epoch 5 --> loss = 2.3030609155019124
Epoch 5 --> model accuracy = 10.09000015258789
Epoch 6 --> loss = 2.3030564697265623
Epoch 6 --> model accuracy = 11.350000381469727
Epoch 7 --> loss = 2.3029387963612873
Epoch 7 --> model accuracy = 11.350000381469727
Epoch 8 --> loss = 2.302702960840861
Epoch 8 --> model accuracy = 11.350000381469727
Epoch 9 --> loss = 2.3030445901234944
Epoch 9 --> model accuracy = 11.350000381469727
Epoch 10 --> loss = 2.3026863758087157
Epoch 10 --> model accuracy = 11.350000381469727
Epoch 11 --> loss = 2.302600602086385
Epoch 11 --> model accuracy = 11.350000381469727
Epoch 12 --> loss = 2.302832009887695
Epoch 12 --> model

"Regrettably, I was unable to enhance the accuracy, which appears to be plateaued at this value of 11.35."