<div style="line-height:0.5">
<h1 style="color:#BF66F2 ">  Multi-layer Perceptron in PyTorch </h1>
<h4> 3 examples with various MLP implementations. </h4>
<span style="display: inline-block;">
    <h3 style="color: lightblue; display: inline;">Keywords:</h3>
    Dataset creation + kaiming_uniform_ + xavier_uniform_ + nn.Dropout(dropout_rate)
</span>
</div>

In [34]:
import warnings 
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

from sklearn.metrics import accuracy_score, classification_report
from sklearn.preprocessing import LabelEncoder
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
from torch.nn.init import xavier_uniform_, kaiming_uniform_
from torch.utils.data import Dataset, DataLoader, random_split

<h3 style="color:#BF66F2 ">  Recap: </h3>
<div style="margin-top: -25px;">
Traditional Multi-Layer Perceptrons (MLPs) are composed primarily of torch Linear (fully connected or dense) layers.
</div>

<h2 style="color:#BF66F2 ">  <u> Example #1 </u> </h2>
Based on the 'ionosphere' dataset

<h3 style="color:#BF66F2 ">  Note: </h3>
<div style="margin-top: -25px;">
Creating a custom CSVDataset class (as a subclass of torch.utils.data.Dataset) is beneficial for:

- Encapsulating all pre-processing preliminar steps like normalization, encoding categorical variables, etc.
- Loading efficienctly and process data in batches to avoid loading the entire dataset into memory. 
- Controlling the data loading and transformation process, in a flexibile manner such as implementing on-the-fly data augmentation, dealing with missing data, or handling imbalanced datasets.
- Ensuring compatibility with PyTorch's data loading utilities, such as DataLoader, which handles batching, shuffling, and multi-process data loading.

In [None]:
class CSVDataset(Dataset):
    """ Custom PyTorch dataset class for working with CSV data.

    Args:
        Path to the CSV file [str]

    Attributes:
        - X : Features data [numpy.ndarray]
        - y : Labels data [numpy.ndarray]

    Methods:
        - __len__(self): Get the number of samples in the dataset.
        - __getitem__(self, idx): Get a data sample (features and label) at a specified index.
        - get_splits(self, n_test=0.33): Split the dataset into training and testing sets.

    """    
    def __init__(self, path):
        """ Initialize the dataset by loading and preprocessing data from a CSV file.

        Args:
            Path to the CSV file [str]
        """
        # Read data from a CSV file into a DataFrame
        df = pd.read_csv(path, header=None)
        
        ##### Get features (X) and labels (y) from the DataFrame
        # Extract all columns except the last one as features
        self.X = df.values[:, :-1]  
        # Extract the last column as labels
        self.y = df.values[:, -1]    
        
        # Convert feature data type to float32
        self.X = self.X.astype('float32')
        # Encode labels using LabelEncoder to convert them into numerical values
        self.y = LabelEncoder().fit_transform(self.y)
        # Convert label data type to float32
        self.y = self.y.astype('float32')
        
        # Reshape labels to be a column vector
        self.y = self.y.reshape((len(self.y), 1))

    def __len__(self):
        """ Get the number of samples in the dataset. """
        return len(self.X)

    def __getitem__(self, idx):
        """ Get a data sample (features and label) at a specified index.

        Args:
            Index of the sample to retrieve [int]

        Returns:
            Features and the corresponding label [list]
        """        
        return [self.X[idx], self.y[idx]]

    def get_splits(self, n_test=0.33):
        """ Split the dataset (without using sklearn) into training and testing sets, according to the size received. 

        Args:
            Proportion of the dataset to include in the test split [float]

        Returns:
            Training and testing datasets [tuple]
        """        
        # Determine sizes
        test_size = round(n_test * len(self.X))
        train_size = len(self.X) - test_size
        # Split
        return random_split(self, [train_size, test_size])                       

<h3 style="color:#BF66F2 ">  Recap: </h3>
<div style="margin-top: -25px;">

The 'kaiming_uniform_' and the 'xavier_uniform_' are 2 weight initialization methods. <br>
The first is suitable for layers that use the Rectified Linear Unit (ReLU) activation function. <br>
The latter is used for layers that use sigmoid or hyperbolic tangent (tanh) activation functions. <br>

Setting initial values for the weights: <br>
- Favouring training convergence 
- Avoiding vanishing or exploding gradients

1. Kaiming uniform (He) initialization <br> 

$std = \sqrt{2 / (fan\_in + a)}$

2. Xavier uniform (Glorot) initialization <br>

$std = \sqrt{2 / (fan\_in + fan\_out)}$
</div>

In [None]:
class MLP(nn.Module):
    """ Multi-Layer Perceptron (MLP) Neural Network model.

    Args:
        Number of input features [int]

    Attributes:
        - First hidden layer with 10 units [nn.Linear]
        - ReLU activation function for the first hidden layer [nn.ReLU]
        - Second hidden layer with 8 units [nn.Linear]
        - ReLU activation function for the second hidden layer [nn.ReLU]
        - Output layer with 1 unit [nn.Linear]
        - Sigmoid activation function for the output layer [nn.Sigmoid]
    """    
    def __init__(self, n_inputs):
        super(MLP, self).__init__()
        #### First hidden layer with 10 units
        self.hidden1 = nn.Linear(n_inputs, 10)
        # Initialize the weights of the first hidden layer using Kaiming (He) initialization
        kaiming_uniform_(self.hidden1.weight, nonlinearity='relu')
        # Define the ReLU activation function for the first hidden layer
        self.act1 = nn.ReLU()
        
        #### Second hidden layer
        self.hidden2 = nn.Linear(10, 8)
        # Initialize the weights of the second hidden layer using Kaiming (He) initialization
        kaiming_uniform_(self.hidden2.weight, nonlinearity='relu')
        self.act2 = nn.ReLU()
        
        #### Third hidden layer and output
        self.hidden3 = nn.Linear(8, 1)
        xavier_uniform_(self.hidden3.weight)
        # Initialize the weights of the output layer using Xavier (Glorot) initialization
        self.act3 = nn.Sigmoid()

    def forward(self, X):
        """ Forward pass to propagate outputs through the MLP.

        Args:
            Input tensor with shape (batch_size, n_inputs)

        Details: 
            - Apply the first hidden layer
            - Apply ReLU activation to the first hidden layer output
            - Apply the second hidden layer
            - Apply ReLU activation to the second hidden layer output
            - Apply the output layer
            - Apply Sigmoid activation to the output layer output
        
        Returns:
            Output tensor with shape (batch_size, 1)\\
            It represents the network's predictions for each input sample.
        """
        X = self.hidden1(X)  
        X = self.act1(X)     
        X = self.hidden2(X)  
        X = self.act2(X)     
        X = self.hidden3(X)  
        X = self.act3(X)     
        return X

In [7]:
def prepare_data(path):
    """ Prepare the dataset. """ 
    # Create a dataset object from a CSV file
    dataset = CSVDataset(path)
    # Split the dataset into training and testing sets
    train, test = dataset.get_splits()
    ## Prepare data loaders for training and testing sets
    train_dl = DataLoader(train, batch_size=32, shuffle=True)
    test_dl = DataLoader(test, batch_size=1024, shuffle=False)
    
    return train_dl, test_dl

def train_model(train_dl, model):
    """ Perform training for a given train dataloader on the received model. """
    # Define the loss function (Binary Cross Entropy)
    criterion = nn.BCELoss()
    # Define the optimizer (Stochastic Gradient Descent)
    optimizer = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    # For all epochs and all mini-batches
    for epoch in range(100):
        for i, (inputs, targets) in enumerate(train_dl):
            # Clear the gradients
            optimizer.zero_grad()
            # Compute the model output
            yhat = model(inputs)
            # Calculate loss
            loss = criterion(yhat, targets)
            # Assign credit 
            loss.backward()
            # Ipdate model weights
            optimizer.step()

def evaluate_model(test_dl, model):
    """ Calculate accuracy of the model, given some unseen samples. 
    
    Details: 
        - detach() creates a tensor (from the computation graph) that shares storage with 'yhat'\\
        It will not have its gradients computed during backpropagation.\\
        Detaching the tensor is crucial when dealing with non-leaf tensors\\
        (those that are not directly resulting from .forward() computations but are intermediary nodes in the computational graph)\\ 
        to avoid PyTorch throwing an error related to trying to access a non-leaf tensor's values! 
    """
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
        # Evaluate the model on the test set
        yhat = model(inputs)
        ## Retrieve numpy arrays
        yhat = yhat.detach().numpy()
        actual = targets.numpy()
        # Reshape
        actual = actual.reshape((len(actual), 1))
        # Round to class values
        yhat = yhat.round()
        ## Store
        predictions.append(yhat)
        actuals.append(actual)
    predictions, actuals = np.vstack(predictions), np.vstack(actuals)
    # Calculate accuracy
    acc = accuracy_score(actuals, predictions)
    
    return acc

def predict(row, model):
    """ Predict class for one sample. """
    # Convert row to data
    row = torch.Tensor([row])
    # Make prediction
    yhat = model(row)
    # Retrieve numpy array
    yhat = yhat.detach().numpy()
    
    return yhat

In [60]:
""" Prepare data """
path = 'https://raw.githubusercontent.com/jbrownlee/Datasets/master/ionosphere.csv'
train_dl, test_dl = prepare_data(path)
example = train_dl.dataset[:3]
print(len(train_dl.dataset), len(test_dl.dataset))
print(type(train_dl.dataset))
example

235 116
<class 'torch.utils.data.dataset.Subset'>

[array([[ 1.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00,  1.0000e+00,
         0.0000e+00,  6.6667e-01,  1.1111e-01,  1.0000e+00, -1.1111e-01,
         8.8889e-01, -1.1111e-01,  1.0000e+00, -2.2222e-01,  7.7778e-01,
         0.0000e+00,  7.7778e-01,  0.0000e+00,  1.0000e+00, -1.1111e-01,
         7.7778e-01, -1.1111e-01,  6.6667e-01, -1.1111e-01,  6.6667e-01,
         0.0000e+00,  9.0347e-01, -5.3520e-02,  1.0000e+00,  1.1111e-01,
         8.8889e-01, -1.1111e-01,  1.0000e+00,  0.0000e+00],
       [ 1.0000e+00,  0.0000e+00,  3.6876e-01, -1.0000e+00, -1.0000e+00,
        -1.0000e+00, -7.6610e-02,  1.0000e+00,  1.0000e+00,  9.5041e-01,
         7.4597e-01, -3.8710e-01, -1.0000e+00, -7.9313e-01, -9.6770e-02,
         1.0000e+00,  4.8684e-01,  4.6502e-01,  3.1755e-01, -2.7461e-01,
        -1.4343e-01, -2.0188e-01, -1.1976e-01,  6.8950e-02,  3.0210e-02,
         6.6390e-02,  3.4430e-02, -1.1860e-02, -4.0300e-03, -1.6720e-02,
        -7.

In [None]:
for i in range(len(train_dl.dataset)//29):   #integer division (//) operator to ensure that the result is an integer
    example = train_dl.dataset[i]
    print(example)

<h2 style="color:#BF66F2 ">  <u> Example #2 </u> </h2>
Based on custom dataset

In [41]:
class MyMLP2(nn.Module):
    """  A Multi-Layer Perceptron (MLP) model with three linear layers (which apply a linear transformation to incoming data) and a binary output. 
    This class is utilized 
    for binary classification tasks. It uses ReLU activation functions for hidden layers and a Sigmoid 
    activation function for the output layer.

    Attributes:
        - hidden1: First hidden layer, containing 15 neurons [torch.nn.Module]
            - Weight initialization: Kaiming (He) initialization for ReLU activation
        - act1: ReLU Activation function applied after the first hidden layer [torch.nn.Module]
        - hidden2: Second hidden layer, containing 12 neurons [torch.nn.Module]
            - Weight initialization: Kaiming (He) initialization for ReLU activation
        - act2: ReLU Activation function  [torch.nn.Module]
        - hidden3: Third hidden layer, containing 10 neurons[torch.nn.Module]
            - Weight initialization: Kaiming (He) initialization for ReLU activation.
        - act3: ReLU Activation function [torch.nn.Module]
        - output: Output layer, containing 1 neuron [torch.nn.Module]
        - act4 : Activation function (Sigmoid) applied after the output layer, squashing outputs to the [0, 1] range [torch.nn.Module]
    """    

    def __init__(self, n_inputs):
        super(MyMLP2, self).__init__()
        self.hidden1 = nn.Linear(n_inputs, 15)
        kaiming_uniform_(self.hidden1.weight, nonlinearity='relu')
        self.act1 = nn.ReLU()
        
        self.hidden2 = nn.Linear(15, 12)
        kaiming_uniform_(self.hidden2.weight, nonlinearity='relu')
        self.act2 = nn.ReLU()
        
        self.hidden3 = nn.Linear(12, 10)
        kaiming_uniform_(self.hidden3.weight, nonlinearity='relu')
        self.act3 = nn.ReLU()
        
        self.output = nn.Linear(10, 1)
        xavier_uniform_(self.output.weight)
        self.act4 = nn.Sigmoid()

    def forward(self, X):
        """ Forward pass through the network.
        
        Parameters:
            Input tensor containing features [torch.Tensor, shape (batch_size, n_inputs)]
            
        Returns:
            Output tensor after passing through the network [torch.Tensor, shape is (batch_size, 1)]
        """        
        X = self.hidden1(X)  
        X = self.act1(X)     
        X = self.hidden2(X)  
        X = self.act2(X)     
        X = self.hidden3(X)  
        X = self.act3(X)
        X = self.output(X)  
        X = self.act4(X)     
        return X


In [42]:
class CSVDataset(Dataset):
    """
    CSVDataset: A PyTorch Dataset for handling CSV files.
    
    Args:
        - Path to the CSV file [str]
        - Column names (or indices) used as input features [list]
        - Column name (or index) used as target/label [str or int]
        - Optional transform to be applied on a sample

    Attributes:
        - The entire dataset loaded as a Pandas DataFrame [pd.DataFrame]
        - Column names (or indices) used as input features [list]
        - Column name (or index) used as target/label [str or int]
        - Optional transform to be applied on a sample 

    """    
    def __init__(self, path):
        """ Initialize the CSVDataset instance.

        Notes:
            - '.values' converts the last column selected by iloc (a Pandas Series) to a NumPy array.
            - reshape(-1, 1) changes the array to have one column and as many rows as necessary to preserve the number of elements.
        """
        df = pd.read_csv(path)
        self.X = df.iloc[:, :-1].values.astype(np.float32)
        self.y = df.iloc[:, -1].values.astype(np.float32).reshape(-1, 1)

    def __len__(self):
        """ Get the total number of samples in the dataset. """
        return len(self.X)

    def __getitem__(self, idx):
        """ Retrieve a sample at the given index. """
        return [self.X[idx], self.y[idx]]

    def get_splits(self, test_size=0.33):
        # Split the data
        X_train, X_test, y_train, y_test = train_test_split(self.X, self.y, test_size=test_size, random_state=42, stratify=self.y)
        train = [X_train, y_train]
        test = [X_test, y_test]
        return CSVSplits(train), CSVSplits(test)

class CSVSplits(Dataset):
    def __init__(self, xy):
        self.X, self.y = xy[0], xy[1]

    def __len__(self):
        return len(self.X)

    def __getitem__(self, idx):
        return [self.X[idx], self.y[idx]]


In [43]:
""" Generate a synthetic dataset, convert it to a DataFrame, and store it as CSV file. """
X, y = make_classification(
    n_samples=1000,
    n_features=20,
    n_informative=15,
    n_classes=2,
    random_state=42
)

data = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(X.shape[1])])
data['label'] = y
data.to_csv('./data/synthetic_data_mlp11.csv', index=False)

In [44]:
path_2 = './data/synthetic_data_mlp11.csv'
train_dl, test_dl = prepare_data(path_2)
example = train_dl.dataset[:3]

type(train_dl.dataset), len(train_dl.dataset), train_dl.dataset[:3]

(__main__.CSVSplits,
 670,
 [array([[-3.3660216e+00,  4.6903701e+00, -2.9365644e-01, -6.2550503e-01,
           1.9373842e+00,  5.3089440e-01, -7.8419065e+00, -9.6214857e+00,
           1.5070822e+00, -3.5070446e+00,  6.5736854e-01, -1.9820571e+00,
           3.5781031e+00, -8.1942484e-02, -2.2932630e+00,  7.3670134e-02,
          -5.7234077e+00, -6.2933826e-01,  5.5940595e+00,  7.3103750e-01],
         [ 3.9878318e+00, -1.4504327e+00, -7.4221528e-01, -4.4320866e-02,
           8.1662476e-01,  2.2038660e+00, -3.2856840e-01, -1.6053659e+00,
           6.5204924e-01, -5.5203280e+00,  3.0548160e+00, -3.5256939e+00,
           1.1853160e+00, -1.1811910e+00, -1.1965318e+00, -8.5692817e-01,
          -1.5044899e+00,  4.9679227e+00,  1.1657474e+01, -6.3480270e-01],
         [-9.5817977e-01,  4.1054419e-01,  1.4526561e+00, -3.4971721e+00,
           3.0058694e-01,  6.7987144e-01,  2.9035997e-01, -3.8141093e+00,
          -1.1357025e+00, -3.0209696e+00,  9.3515891e-01, -2.5258760e+00,
         

In [48]:
def train_model_2(train_dl, model):
    """ Perform training for a given train dataloader on the received model. """
    ###### Some loss functions
    criterion0 = nn.BCELoss()
    criterion1 = nn.BCEWithLogitsLoss()
    criterion2 = nn.MSELoss()
    criterion3 = nn.MarginRankingLoss()
    criterion4 = nn.HingeEmbeddingLoss()
    criterion5 = nn.SoftMarginLoss()
    
    ###### Some optimizers
    optimizer0 = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    optimizer1 = optim.LBFGS(model.parameters(), lr=1e-2)
    optimizer2 = optim.Rprop(model.parameters(), lr=1e-2)
    optimizer3 = optim.ASGD(model.parameters(), lr=1e-2)
    optimizer4 = optim.AdamW(model.parameters(), lr=1e-2)
    optimizer5 = optim.Adamax(model.parameters(), lr=1e-2)

    # For all epochs and all mini-batches...
    for epoch in range(100):
        for i, (inputs, targets) in enumerate(train_dl):
            def closure():
                """ Now i need to define a clousure function that is necessary
                for the LBFGS optimizer to RE-evaluate the model and RE-compute the gradients, to minimize the objective function, due to its quasi-Newton nature.
                """
                # Clear the gradients
                optimizer1.zero_grad()
                # Compute the model output
                yhat = model(inputs)
                # Calculate loss
                loss = criterion1(yhat, targets)
                # Assign credit 
                loss.backward()
                return loss

            # Update model weights using the closure function
            optimizer1.step(closure)

def predict_2(test_dl, model):
    """ Calculate metrics.\\ 
    Check evaluate_model method above. 
    """
    predictions, actuals = list(), list()
    for i, (inputs, targets) in enumerate(test_dl):
        yhat = model(inputs)
        yhat = yhat.detach().numpy()
        actual = targets.numpy()
        actual = actual.reshape((len(actual), 1))
        yhat = yhat.round()
        predictions.append(yhat)
        actuals.append(actual)
    predictions, actuals = np.vstack(predictions), np.vstack(actuals)
    
    return predictions, actuals

In [46]:
n_inputs = train_dl.dataset[0][0].shape[0]
model_2 = MyMLP2(n_inputs)
model_2

MyMLP2(
  (hidden1): Linear(in_features=20, out_features=15, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=15, out_features=12, bias=True)
  (act2): ReLU()
  (hidden3): Linear(in_features=12, out_features=10, bias=True)
  (act3): ReLU()
  (output): Linear(in_features=10, out_features=1, bias=True)
  (act4): Sigmoid()
)

In [49]:
# Training
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_model_2(train_dl, model_2)

In [16]:
test_arr = [
    -8.3762154e-01,  9.4032541e-01, -2.9565234e-01, -5.7823402e-01,
    1.2473841e+00,  4.9238140e-01, -7.1245905e-01, -9.3821475e-01,
    1.1074532e+00, -3.1046456e-01,  6.2431584e-01, -1.7843570e-01,
    3.4785401e-01, -7.2924824e-02, -2.1325610e-01,  7.3725134e-02,
    -5.5123057e-01, -6.1438266e-01,  5.7940594e-01,  8.2131500e-01
    ]

# Make predictions 
y_pred = predict(test_arr, model_2)
# Evaluate
acc = evaluate_model(test_dl, model_2)

print(y_pred)
print('Accuracy: %.3f' % acc)

[[0.9523599]]
Accuracy: 0.918


In [25]:
act, predictions = predict_2(test_dl, model_2)
report = classification_report(act, predictions, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

         0.0       0.89      0.94      0.92       156
         1.0       0.95      0.90      0.92       174

    accuracy                           0.92       330
   macro avg       0.92      0.92      0.92       330
weighted avg       0.92      0.92      0.92       330



<h4 style="color:#BF66F2 ">  => Change loss and optimizer #1 </h4>

In [50]:
def train_model_2_1(train_dl, model):
    """ Perform training for a given train dataloader on the received model.\\
        Check 'train_model_2()' for further info. """
    ###### Some loss functions
    criterion0 = nn.BCELoss()
    criterion1 = nn.BCEWithLogitsLoss()
    criterion2 = nn.MSELoss()
    criterion3 = nn.MarginRankingLoss()
    criterion4 = nn.HingeEmbeddingLoss()
    criterion5 = nn.SoftMarginLoss()
    
    ###### Some optimizers
    optimizer0 = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    optimizer1 = optim.LBFGS(model.parameters(), lr=1e-2)
    optimizer2 = optim.Rprop(model.parameters(), lr=1e-2)
    optimizer3 = optim.ASGD(model.parameters(), lr=1e-2)
    optimizer4 = optim.AdamW(model.parameters(), lr=1e-2)
    optimizer5 = optim.Adamax(model.parameters(), lr=1e-2)

    for epoch in range(100):
        for i, (inputs, targets) in enumerate(train_dl):
            def closure():
                optimizer1.zero_grad()
                yhat = model(inputs)
                loss = criterion1(yhat, targets)
                loss.backward()
                return loss

            optimizer1.step(closure)

In [51]:
n_inputs = train_dl.dataset[0][0].shape[0]
model_2 = MyMLP2(n_inputs)
model_2

MyMLP2(
  (hidden1): Linear(in_features=20, out_features=15, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=15, out_features=12, bias=True)
  (act2): ReLU()
  (hidden3): Linear(in_features=12, out_features=10, bias=True)
  (act3): ReLU()
  (output): Linear(in_features=10, out_features=1, bias=True)
  (act4): Sigmoid()
)

In [52]:
# Training
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_model_2_1(train_dl, model_2)

In [53]:
test_arr = [
    -8.3762154e-01,  9.4032541e-01, -2.9565234e-01, -5.7823402e-01,
    1.2473841e+00,  4.9238140e-01, -7.1245905e-01, -9.3821475e-01,
    1.1074532e+00, -3.1046456e-01,  6.2431584e-01, -1.7843570e-01,
    3.4785401e-01, -7.2924824e-02, -2.1325610e-01,  7.3725134e-02,
    -5.5123057e-01, -6.1438266e-01,  5.7940594e-01,  8.2131500e-01
    ]

# Make predictions 
y_pred = predict(test_arr, model_2)
# Evaluate
acc = evaluate_model(test_dl, model_2)

print(y_pred)
print('Accuracy: %.3f' % acc)

[[1.]]
Accuracy: 0.718


In [54]:
act, predictions = predict_2(test_dl, model_2)
report = classification_report(act, predictions, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

         0.0       0.51      0.88      0.64        96
         1.0       0.93      0.65      0.77       234

    accuracy                           0.72       330
   macro avg       0.72      0.76      0.71       330
weighted avg       0.81      0.72      0.73       330



<h4 style="color:#BF66F2 ">  => Change loss and optimizer #2 </h4>

In [61]:
def train_model_2_2(train_dl, model):
    """ Perform training for a given train dataloader on the received model.\\
        Check 'train_model_2()' for further info. """
    margin = 0.5  
    
    ###### Some loss functions
    criterion0 = nn.BCELoss()
    criterion1 = nn.BCEWithLogitsLoss()
    criterion2 = nn.MSELoss()
    criterion3 = nn.MarginRankingLoss(margin)
    criterion4 = nn.HingeEmbeddingLoss()
    criterion5 = nn.SoftMarginLoss()
    
    ###### Some optimizers
    optimizer0 = torch.optim.SGD(model.parameters(), lr=0.01, momentum=0.9)
    optimizer1 = optim.LBFGS(model.parameters(), lr=1e-2)
    optimizer2 = optim.Rprop(model.parameters(), lr=1e-2)
    optimizer3 = optim.ASGD(model.parameters(), lr=1e-2)
    optimizer4 = optim.AdamW(model.parameters(), lr=1e-2)
    optimizer5 = optim.Adamax(model.parameters(), lr=1e-2)

    for epoch in range(100):
        for i, (inputs, targets) in enumerate(train_dl):
            def closure():
                optimizer4.zero_grad()
                yhat = model(inputs)
                # Create x since loss need tensors as input
                baseline = torch.zeros_like(yhat) 
                loss = criterion3(yhat, baseline, targets)
                loss.backward()
                return loss

            optimizer4.step(closure)

In [62]:
n_inputs = train_dl.dataset[0][0].shape[0]
model_2 = MyMLP2(n_inputs)
model_2

MyMLP2(
  (hidden1): Linear(in_features=20, out_features=15, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=15, out_features=12, bias=True)
  (act2): ReLU()
  (hidden3): Linear(in_features=12, out_features=10, bias=True)
  (act3): ReLU()
  (output): Linear(in_features=10, out_features=1, bias=True)
  (act4): Sigmoid()
)

In [63]:
# Training
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_model_2_2(train_dl, model_2)

In [64]:
test_arr = [
    -8.3762154e-01,  9.4032541e-01, -2.9565234e-01, -5.7823402e-01,
    1.2473841e+00,  4.9238140e-01, -7.1245905e-01, -9.3821475e-01,
    1.1074532e+00, -3.1046456e-01,  6.2431584e-01, -1.7843570e-01,
    3.4785401e-01, -7.2924824e-02, -2.1325610e-01,  7.3725134e-02,
    -5.5123057e-01, -6.1438266e-01,  5.7940594e-01,  8.2131500e-01
    ]

# Make predictions 
y_pred = predict(test_arr, model_2)
# Evaluate
acc = evaluate_model(test_dl, model_2)

print(y_pred)
print('Accuracy: %.3f' % acc)

[[0.7745221]]
Accuracy: 0.500


In [65]:
act, predictions = predict_2(test_dl, model_2)
report = classification_report(act, predictions, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

         0.0       0.00      1.00      0.00         0
         1.0       1.00      0.50      0.67       330

    accuracy                           0.50       330
   macro avg       0.50      0.75      0.33       330
weighted avg       1.00      0.50      0.67       330



<h4 style="color:#BF66F2 ">  => Change loss and optimizer #3 </h4>

In [66]:
def train_model_2_2(train_dl, model):
    """ Perform training for a given train dataloader on the received model.\\
        Check 'train_model_2()' for further info. """
    criterion4 = nn.HingeEmbeddingLoss()
    optimizer2 = optim.Rprop(model.parameters(), lr=1e-2)

    for epoch in range(100):
        for i, (inputs, targets) in enumerate(train_dl):
            optimizer2.zero_grad()  
            yhat = model(inputs)    
            loss = criterion4(yhat, targets) 
            loss.backward()  
            optimizer2.step() 


In [67]:
n_inputs = train_dl.dataset[0][0].shape[0]
model_2 = MyMLP2(n_inputs)
model_2

MyMLP2(
  (hidden1): Linear(in_features=20, out_features=15, bias=True)
  (act1): ReLU()
  (hidden2): Linear(in_features=15, out_features=12, bias=True)
  (act2): ReLU()
  (hidden3): Linear(in_features=12, out_features=10, bias=True)
  (act3): ReLU()
  (output): Linear(in_features=10, out_features=1, bias=True)
  (act4): Sigmoid()
)

In [68]:
# Training
with warnings.catch_warnings():
    warnings.filterwarnings("ignore", category=UserWarning)
    train_model_2_2(train_dl, model_2)

In [69]:
test_arr = [
    -8.3762154e-01,  9.4032541e-01, -2.9565234e-01, -5.7823402e-01,
    1.2473841e+00,  4.9238140e-01, -7.1245905e-01, -9.3821475e-01,
    1.1074532e+00, -3.1046456e-01,  6.2431584e-01, -1.7843570e-01,
    3.4785401e-01, -7.2924824e-02, -2.1325610e-01,  7.3725134e-02,
    -5.5123057e-01, -6.1438266e-01,  5.7940594e-01,  8.2131500e-01
    ]

# Make predictions 
y_pred = predict(test_arr, model_2)
# Evaluate
acc = evaluate_model(test_dl, model_2)

print(y_pred)
print('Accuracy: %.3f' % acc)

[[0.]]
Accuracy: 0.500


In [70]:
act, predictions = predict_2(test_dl, model_2)
report = classification_report(act, predictions, zero_division=1)
print("Classification Report:")
print(report)

Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      0.50      0.67       330
         1.0       0.00      1.00      0.00         0

    accuracy                           0.50       330
   macro avg       0.50      0.75      0.33       330
weighted avg       1.00      0.50      0.67       330



<h2 style="color:#BF66F2 ">  <u> Example #3 </u> </h2>
Based on the MNIST dataset

In [30]:
class MyMLP3(nn.Module):
    """A PyTorch Multi-Layer Perceptron model for multiclass classification.
    
    Args:
        - Number of input features [int]
        - Number of classes (output units) [int]
        - Dropout rate for dropout layer [float, optional, default is 0.2]

    Attributes:
        - First hidden layer with 64 units [nn.Linear]
        - Second hidden layer with 32 units [nn.Linear]
        - Third hidden layer with 16 units[nn.Linear]
        - Dropout layer to reduce overfitting [nn.Dropout]
        - Output layer with units equal to the number of classes [nn.Linear]
    """

    def __init__(self, input_size, output_size, dropout_rate=0.2):
        super(MyMLP3, self).__init__()
        self.layer1 = nn.Linear(input_size, 64)
        self.layer2 = nn.Linear(64, 32)
        self.layer3 = nn.Linear(32, 16)
        self.dropout = nn.Dropout(dropout_rate)
        self.output = nn.Linear(16, output_size)

    def forward(self, x):
        """ Forward pass through the network. """
        x = F.relu(self.layer1(x))
        x = F.relu(self.layer2(x))
        x = self.dropout(x)
        x = F.relu(self.layer3(x))
        x = self.output(x)
        return x

<h3 style="color:#BF66F2 ">  Recap: </h3>
<div style="margin-top: -25px;">

- nn.ReLU() creates an nn.Module which can be added to an nn.Sequential model
- nn.functional.relu is just the functional API call to the relu function that can be added in a forward method

In [27]:
# Define a transform to normalize the data
transform = transforms.Compose([transforms.ToTensor(), transforms.Normalize((0.5,), (0.5,))])

#### Download and load the training and test data
train_data = MNIST(root='./data', train=True, download=True, transform=transform)
test_data = MNIST(root='./data', train=False, download=True, transform=transform)
train_loader = DataLoader(train_data, batch_size=32, shuffle=True)
test_loader = DataLoader(test_data, batch_size=32, shuffle=False)

Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz to ./data/MNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 9912422/9912422 [00:00<00:00, 11631440.37it/s]


Extracting ./data/MNIST/raw/train-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz to ./data/MNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 28881/28881 [00:00<00:00, 31710914.61it/s]


Extracting ./data/MNIST/raw/train-labels-idx1-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 1648877/1648877 [00:00<00:00, 11770007.11it/s]


Extracting ./data/MNIST/raw/t10k-images-idx3-ubyte.gz to ./data/MNIST/raw

Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz
Downloading http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 4542/4542 [00:00<00:00, 17721422.11it/s]

Extracting ./data/MNIST/raw/t10k-labels-idx1-ubyte.gz to ./data/MNIST/raw






In [32]:
### Initialize the model, criterion, and optimizer
model = MyMLP3(input_size=784, output_size=10, dropout_rate=0.2)
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)

# Training
num_epochs = 10 
for epoch in range(num_epochs):
    for inputs, labels in train_loader:
        # Reshape the input
        inputs = inputs.view(-1, 28*28)  
        optimizer.zero_grad()
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
    
    if (epoch+1) % 1 == 0:  
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


Epoch [1/10], Loss: 0.4742
Epoch [2/10], Loss: 0.3746
Epoch [3/10], Loss: 0.5508
Epoch [4/10], Loss: 0.4877
Epoch [5/10], Loss: 0.0695
Epoch [6/10], Loss: 0.1630
Epoch [7/10], Loss: 0.1675
Epoch [8/10], Loss: 0.0565
Epoch [9/10], Loss: 0.1157
Epoch [10/10], Loss: 0.1757


In [33]:
# Evaluate the model
model.eval()
total, correct = 0, 0

with torch.no_grad():
    for inputs, labels in test_loader:
        # Reshape the input
        inputs = inputs.view(-1, 28*28)  
        outputs = model(inputs)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print(f'Accuracy of the model on the test set: {100 * correct / total}%')

Accuracy of the model on the test set: 95.95%
