# **Chapter 7. Deep Learning**

## **7.2. Multilayer Perceptron (MLP)**

In this section, we will explore the structure of a MLP, and use MLP network to solve more complex regression and classification problems.

### **7.2.1. Structure of MLP Network**

The structure of a MLP network can be described as follow:

![MLP structure](images/MLP_structure.png)

**Key points to remember about MLP network**

- MLP can solve comlex problems with non-linearity, which a single perceptron can not.
- MLP contains 1 or multiple hidden layers and 1 output layer. Each layer has a defined number of perceptrons (neurons) in it.
- Each neurons contains the weights (w) for each of its inputs, and a bias (b).
- The output of a neuron is calculated using the following equation:

 $$\text{output} = f(\sum_{i}(w_i \times x_i) + b)$$

- The output of a neuron should be passed through an activation function. Activation function has many purposes in neural network, such as mapping the values to a new desired range, and introducing non-linearity to the network.
- Some commonly used activation function in ANN are:
 
![Activation functions](images/activation_functions.png)

- The input and output values of the network should be scaled to small range for gradient descent to work.

### **7.2.2. How MLP Network Works**

The workflow of an MLP network is similar to a single-perceptron network, which includes the following steps:
- **Step 1.** Initialize parameters
- **Step 2.** Forward pass
- **Step 3.** Calculate the value of the loss function
- **Step 4.** Backward pass
- **Step 5.** Update parameters.
- Return to **Step 2**, repeat after a number of iterations (epochs) or until the loss function is low enough.

During the backward pass, the change chain rule is applied to calculate the gradients through activation functions. For example:

![Backpropagation - Activation functions](images/backpropagation_activation_function.png)

Because the gradient flows through activation function, **the first derivative of the activation function must exist** in order for backpropagation to work.

### **7.2.3. MLP for Regression**

#### ***7.2.3.1. MLP network with 1 input, 1 hidden layer containing 5 neurons, 1 output***

In this section, we will build an MLP network to perform regression for non-linear function y=f(x)

*Network structure:*

![MLP network](images/MLP_1_5_1.png)

**Import required libraries**

In [None]:
import math
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
from tqdm import tqdm

**a. Load data**

In [None]:
# Define the data points
x = np.linspace(0, 10, 11)
y = x*x*x - 10*x*x + 25

**b. Data preprocessing**

In [None]:
# Define input and output scalers
input_scaler = MinMaxScaler(feature_range=(0, 1))
output_scaler = MinMaxScaler(feature_range=(0, 1))

# Scale the data to the range from 0 to 1
x_scaled = input_scaler.fit_transform(x.reshape(-1, 1)).reshape(-1)
y_scaled = output_scaler.fit_transform(y.reshape(-1, 1)).reshape(-1)

# Convert to tensor
x_scaled = torch.tensor(x_scaled, dtype=torch.float)
y_scaled = torch.tensor(y_scaled, dtype=torch.float)

**c. Create model**

In [None]:
# Define the regression class
class RegressionModel(nn.Module):
    def __init__(self):
        super(RegressionModel, self).__init__()
        self.hidden = nn.Linear(1, 5)  # 1 input and 5 hidden
        self.output = nn.Linear(5, 1)  # 5 hidden and 1 output

    def forward(self, x):
        x = self.hidden(x)
        x = torch.sigmoid(x)
        return self.output(x)

# Instantiate the model
model = RegressionModel()

# View the model architecture
print(model)

**d. Training**

In [None]:
# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Define the train function
def train(x, y):
    # Forward pass
    outputs = model(x.unsqueeze(1))
    loss = criterion(outputs, y.unsqueeze(1))
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return loss.item()

# Create lists of losses for visualization
losses = []

In [None]:
# Train the model
num_epochs = 5000
progress_bar = tqdm(range(num_epochs))
for epoch in progress_bar:
    train_loss = train(x_scaled, y_scaled)
    
    # Add loss to lists for visualization
    losses.append(train_loss)
        
    # Print progress
    progress_bar.set_description(f'Epoch [{epoch+1}/{num_epochs}], Train loss: {train_loss:.4f}') 

**e. Visualization**

In [None]:
# Axis limits
x_min = math.floor(min(x))
x_max = math.ceil(max(x))
y_min = math.floor(min(y))
y_max = math.ceil(max(y))

# Create the plot
x_values = np.linspace(x_min, x_max, num=101)
x_values_scaled = input_scaler.transform(x_values.reshape(-1, 1)).reshape(-1)
y_values_scaled = model(torch.tensor(x_values_scaled).float().unsqueeze(1))
y_values_scaled = y_values_scaled.detach().numpy()
y_values = output_scaler.inverse_transform(y_values_scaled.reshape(-1, 1)).reshape(-1)

fig = plt.figure()
plt.scatter(x, y, color='red')
plt.plot(x_values, y_values)
plt.show()

# Visualize MSE loss values over time
plt.plot(losses)
plt.xlabel('epoch')
plt.ylabel('MSE loss')

**f. Make prediction**

In [None]:
# Input for prediction
x_pred = 5

# Scale input to range 0 to 1
x_pred_scaled = input_scaler.transform([[x_pred]])

# Convert to tensor
x_pred_scaled = torch.tensor(x_pred_scaled).float()

# Run model forward
y_pred_scaled = model(x_pred_scaled)

# Convert back to number
y_pred_scaled = y_pred_scaled.item()

# Scale output back to original range
y_pred = output_scaler.inverse_transform([[y_pred_scaled]])[0][0]

# Show result
print(y_pred)

**g. Save and load model**

In [None]:
# Save model
model_name = 'MLP1'
file_name = f'./{model_name}_{num_epochs}.ckpt'
torch.save(model.state_dict(), file_name)

In [None]:
# Load model
file_name = f'./{model_name}_5000.ckpt'
loaded_model = RegressionModel()
loaded_model.load_state_dict(torch.load(file_name, weights_only=True))

#### ***7.2.3.2. MLP network with 3 inputs, 2 hidden layers containing 5 neurons each, 2 outputs***

In this section, we will build an MLP network to perform regression for non-linear function with multiple inputs and outputs $(y_1, y_2) = f(x_1, x_2, x_3)$

*Network structure:*
    
![MLP network](images/MLP_3_5_5_2.png)

**Import required libraries**

In [None]:
# Import modules
import math
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
from tqdm import tqdm

**a. Load data**

In [None]:
# Define the data points
np.random.seed(42)
x1 = np.random.rand(10)
x2 = np.random.rand(10)
x3 = np.random.rand(10)
y1 = x1*x1*x2 - 5*x2*x3 + 5
y2 = x3*x1 + 8*x2*x1 - 15

**b. Data preprocessing**

In [None]:
# Combine inputs into 1 array
x = np.column_stack((x1, x2, x3))

# Combine outputs into 1 array
y = np.column_stack((y1, y2))

# Define input and output scalers
input_scaler = MinMaxScaler(feature_range=(0, 1))
output_scaler = MinMaxScaler(feature_range=(0, 1))

# Scale the data to the range from 0 to 1
x_scaled = input_scaler.fit_transform(x)
y_scaled = output_scaler.fit_transform(y)

# Convert to tensor
x_scaled = torch.tensor(x_scaled, dtype=torch.float)
y_scaled = torch.tensor(y_scaled, dtype=torch.float)

**c. Create model**

In [None]:
# Define regression class
class RegressionModel(nn.Module):
    def __init__(self):
        super(RegressionModel, self).__init__()
        self.hidden1 = nn.Linear(3, 5)  # 3 input and 5 hidden
        self.hidden2 = nn.Linear(5, 5)  # 5 hidden and 5 hidden
        self.output = nn.Linear(5, 2)  # 5 hidden and 2 output

    def forward(self, x):
        x = self.hidden1(x)
        x = torch.sigmoid(x)
        x = self.hidden2(x)
        x = torch.sigmoid(x)
        return self.output(x)

# Instantiate the model
model = RegressionModel()

# View the model architecture
print(model)

**d. Training**

In [None]:
# Define the loss function and optimizer
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.01)

# Define the train function
def train(x, y):
    # Forward pass
    outputs = model(x)
    loss = criterion(outputs, y)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return loss.item()

# Create lists of losses for visualization
losses = []

In [None]:
# Train the model
num_epochs = 5000
progress_bar = tqdm(range(num_epochs))
for epoch in progress_bar:
    train_loss = train(x_scaled, y_scaled)
    
    # Add loss to lists for visualization
    losses.append(train_loss)
        
    # Print progress
    progress_bar.set_description(f'Epoch [{epoch+1}/{num_epochs}], Train loss: {train_loss:.4f}') 

**e. Visualization**

In [None]:
# Visualize MSE loss values over time
plt.plot(losses)
plt.xlabel('epoch')
plt.ylabel('MSE loss')

**f. Make prediction**

In [None]:
# Input for prediction
x_pred = [8, 3, 4]

# Scale input to range 0 to 1
x_pred_scaled = input_scaler.transform([x_pred])

# Convert to tensor
x_pred_scaled = torch.tensor(x_pred_scaled).float()

# Run model forward
y_pred_scaled = model(x_pred_scaled)

# Convert back to number
y_pred_scaled = y_pred_scaled.detach().numpy()

# Scale output back to original range
y_pred = output_scaler.inverse_transform(y_pred_scaled)[0]

# Show result
print(y_pred)

**g. Save and load model**

In [None]:
# Save model
model_name = 'MLP2'
file_name = f'./{model_name}_{num_epochs}.ckpt'
torch.save(model.state_dict(), file_name)

In [None]:
# Load model
file_name = f'./{model_name}_5000.ckpt'
loaded_model = RegressionModel()
loaded_model.load_state_dict(torch.load(file_name, weights_only=True))

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 1</b></p>

1. **Load Data:** Load the solubility dataset from the file `Solubility.csv`.

2. **Data Preprocessing:** Scale the inputs and outputs with min-max scaler.

3. **Model Training and Evaluation:** Train a MLP network with 2 hidden layers, each contains 32 neurons and evaluate its performance in predicting water solubility of molecules.

4. **Change Model Parameters:** Try again with different MLP architectures, explorer the effects of number of epochs and learning rate on model performance and training time.

### **7.2.4. MLP for Binary Classification**

In this section, we will build an MLP network to predict whether a molecule can penetrate the blood-brain barrier.

**Import required libraries**

In [None]:
# Import modules
import math
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold
from sklearn.decomposition import PCA
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_curve, roc_auc_score, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from tqdm import tqdm

**a. Load data**

In [None]:
# Load the dataset
data_file_path = './datasets/BBBP.csv'
df = pd.read_csv(data_file_path)
df.head()

***c. Get the input and output columns***

In [None]:
x = df.drop('p_np', axis=1).to_numpy()
y = df['p_np'].to_numpy()
print(f'Shape of inputs: {x.shape}')
print(f'Shape of output: {y.shape}')

**d. Data preprocessing**

In [None]:
# Set the random seed
random_seed = 0
np.random.seed(random_seed)

# Reduce number of inputs with variance threshold and PCA
selector = VarianceThreshold(threshold=0.1)
x_reduced = selector.fit_transform(x)

pca = PCA(n_components=32)  # Reduce to 32 dimensions
x_reduced = pca.fit_transform(x_reduced)

# Split the data into training and testing sets
x_train, x_test, y_train, y_test = train_test_split(x_reduced, y, test_size=0.2, random_state=random_seed)

# Split the training dataset into training and validation sets
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.25, random_state=random_seed)

x_train = torch.tensor(x_train, dtype=torch.float)
y_train = torch.tensor(y_train, dtype=torch.float)
x_val = torch.tensor(x_val, dtype=torch.float)
y_val = torch.tensor(y_val, dtype=torch.float)
x_test = torch.tensor(x_test, dtype=torch.float)

**e. Create model**

In [None]:
# Define the classification model class
class ClassificationModel(nn.Module):
    def __init__(self, n_inputs, n_layers, n_hiddens, n_outputs):
        super(ClassificationModel, self).__init__()
        self.hiddens = nn.ModuleList()
        self.hiddens.append(nn.Linear(n_inputs, n_hiddens))
        for _ in range(1, n_layers):
            self.hiddens.append(nn.Linear(n_hiddens, n_hiddens))
        self.output = nn.Linear(n_hiddens, n_outputs)

    def forward(self, x):
        for hidden in self.hiddens:
            x = torch.sigmoid(hidden(x))
        return self.output(x)

# Instantiate the model
n_inputs = x_train.shape[1]
n_layers = 3
n_hiddens = 64
n_outputs = 1
model = ClassificationModel(n_inputs, n_layers, n_hiddens, n_outputs)

# View the model architecture
print(model)

**f. Training**

In [None]:
# Define the loss function and optimizer
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.05)

# Define the train function
def train(x, y):
    # Set the model to train mode
    model.train()
    
    # Forward pass
    output = model(x)
    loss = criterion(output, y)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return loss.item()

# Define the validation function
def validation(x, y):
    # Set the model to evaluation mode
    model.eval()
    
    # Forward pass
    outputs = model(x)
    loss = criterion(outputs, y)
    
    return loss.item()

# Create lists of losses for visualization
train_losses = []
val_losses = []

In [None]:
# Train the model
num_epochs = 200
progress_bar = tqdm(range(num_epochs))
for epoch in progress_bar:
    train_loss = train(x_train, y_train.unsqueeze(1))
    val_loss = validation(x_val, y_val.unsqueeze(1))
    
    # Add loss to lists for visualization
    train_losses.append(train_loss)
    val_losses.append(val_loss)
        
    # Print progress
    progress_bar.set_description(f'Epoch [{epoch+1}/{num_epochs}], Train loss: {train_loss:.4f}, Validation loss: {val_loss:.4f}') 

**g. Visualization**

In [None]:
# Visualize MSE loss values over time
plt.plot(train_losses)
plt.plot(val_losses)
plt.xlabel('epoch')
plt.ylabel('BCE loss with logits')

**h. Evaluation**

In [None]:
# Set the model to evaluation mode
model.eval()

# Forward the test set
logits = model(x_test)
probabilities = torch.sigmoid(logits).detach().cpu().numpy().reshape(-1)
y_pred = np.round(probabilities)

# Evaluate the model
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
fpr, tpr, thresholds = roc_curve(y_test, probabilities)
roc_auc = roc_auc_score(y_test, probabilities)

# Display classification metrics
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Display confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap="Blues")

# Plot the ROC curve
plt.figure(figsize=(8, 6))
plt.plot(fpr, tpr, label=f'ROC Curve (AUC = {roc_auc:.4f})', linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', linewidth=1)  # Diagonal line
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate', fontsize=12)
plt.ylabel('True Positive Rate', fontsize=12)
plt.title('Receiver Operating Characteristic (ROC) Curve', fontsize=14)
plt.legend(loc="lower right", fontsize=12)
plt.grid(alpha=0.3)
plt.show()

**i. Save and load model**

In [None]:
# Save model
model_name = 'MLP3'
file_name = f'./{model_name}_{num_epochs}.ckpt'
torch.save(model.state_dict(), file_name)

In [None]:
# Load model
file_name = f'./{model_name}_200.ckpt'
loaded_model = ClassificationModel(n_inputs, n_layers, n_hiddens, n_outputs)
loaded_model.load_state_dict(torch.load(file_name, weights_only=True))

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 2</b></p>

1. **Load Data:** Load the breast cancer dataset from the file `BreastCancer.csv`.

2. **Data Preprocessing:** Scale the inputs and outputs with min-max scaler.

3. **Model Training and Evaluation:** Train a MLP network with a structure of your choice and evaluate its performance in predicting breast cancer.

4. **Change Model Parameters:** Try again with different MLP architectures, explorer the effects of number of epochs and learning rate on model performance and training time.

### **7.2.5. MLP for Multiclass Classification**

In this section, we will build an MLP network to predict the species of iris flowers.

**Import required libraries**

In [None]:
# Import modules
import math
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.preprocessing import MinMaxScaler, OneHotEncoder
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import VarianceThreshold
from sklearn.metrics import confusion_matrix, accuracy_score, precision_score, recall_score, f1_score, roc_curve, roc_auc_score, ConfusionMatrixDisplay
import matplotlib.pyplot as plt
from tqdm import tqdm

**a. Load data**

In [None]:
# Load the dataset
data_file_path = './datasets/Wine.csv'
df = pd.read_csv(data_file_path)
df.head()

***c. Get the input and output columns***

In [None]:
x = df.drop('Wine', axis=1).to_numpy()
y = df['Wine'].to_numpy()
print(f'Shape of inputs: {x.shape}')
print(f'Shape of output: {y.shape}')

**d. Data preprocessing**

In [None]:
# Scale the input to the range from 0 to 1
input_scaler = MinMaxScaler(feature_range=(0, 1))
x_scaled = input_scaler.fit_transform(x)

# Encode the output with one-hot encoder
output_encoder = OneHotEncoder(sparse_output=False)
y_encoded = output_encoder.fit_transform(y.reshape(-1, 1))

# Set the random seed
random_seed = 0
np.random.seed(random_seed)

# Split the data into training and testing sets
x_train_scaled, x_test_scaled, y_train_encoded, y_test_encoded = train_test_split(x_scaled, y_encoded, test_size=0.2, random_state=random_seed)

# Split the training dataset into training and validation sets
x_train_scaled, x_val_scaled, y_train_encoded, y_val_encoded = train_test_split(x_train_scaled, y_train_encoded, test_size=0.25, random_state=random_seed)

x_train_scaled = torch.tensor(x_train_scaled, dtype=torch.float)
y_train_encoded = torch.tensor(y_train_encoded, dtype=torch.float)
x_val_scaled = torch.tensor(x_val_scaled, dtype=torch.float)
y_val_encoded = torch.tensor(y_val_encoded, dtype=torch.float)
x_test_scaled = torch.tensor(x_test_scaled, dtype=torch.float)

**e. Create model**

In [None]:
# Define the classification model class
class ClassificationModel(nn.Module):
    def __init__(self, n_inputs, n_layers, n_hiddens, n_outputs):
        super(ClassificationModel, self).__init__()
        self.hiddens = nn.ModuleList()
        self.hiddens.append(nn.Linear(n_inputs, n_hiddens))
        for _ in range(1, n_layers):
            self.hiddens.append(nn.Linear(n_hiddens, n_hiddens))
        self.output = nn.Linear(n_hiddens, n_outputs)

    def forward(self, x):
        for hidden in self.hiddens:
            x = torch.sigmoid(hidden(x))
        return self.output(x)

# Instantiate the model
n_inputs = x_train_scaled.shape[1]
n_layers = 3
n_hiddens = 64
n_outputs = y_train_encoded.shape[1]
model = ClassificationModel(n_inputs, n_layers, n_hiddens, n_outputs)

# View the model architecture
print(model)

**f. Training**

In [None]:
# Define the loss function and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.AdamW(model.parameters(), lr=0.01)

# Define the train function
def train(x, y):
    # Set the model to train mode
    model.train()
    
    # Forward pass
    output = model(x)
    loss = criterion(output, y)
    
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    return loss.item()

# Define the validation function
def validation(x, y):
    # Set the model to evaluation mode
    model.eval()
    
    # Forward pass
    outputs = model(x)
    loss = criterion(outputs, y)
    
    return loss.item()

# Create lists of losses for visualization
train_losses = []
val_losses = []

In [None]:
# Train the model
num_epochs = 500
progress_bar = tqdm(range(num_epochs))
for epoch in progress_bar:
    train_loss = train(x_train_scaled, y_train_encoded)
    val_loss = validation(x_val_scaled, y_val_encoded)
    
    # Add loss to lists for visualization
    train_losses.append(train_loss)
    val_losses.append(val_loss)
        
    # Print progress
    progress_bar.set_description(f'Epoch [{epoch+1}/{num_epochs}], Train loss: {train_loss:.4f}, Validation loss: {val_loss:.4f}') 

**g. Visualization**

In [None]:
# Visualize MSE loss values over time
plt.plot(train_losses)
plt.plot(val_losses)
plt.xlabel('epoch')
plt.ylabel('Cross entropy loss')

**h. Evaluation**

In [None]:
# Set the model to evaluation mode
model.eval()

# Forward the test set
logits = model(x_test_scaled)
probabilities = torch.softmax(logits, dim=1).detach().cpu().numpy()  # Get probabilities
y_pred = np.argmax(probabilities, axis=1) # Get predicted class labels
y_test = np.argmax(y_test_encoded, axis=1) # Get actual class labels

# Evaluate the model
cm = confusion_matrix(y_test, y_pred)
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred, average='weighted')
recall = recall_score(y_test, y_pred, average='weighted')
f1 = f1_score(y_test, y_pred, average='weighted')

# Display classification metrics
print(f"Accuracy: {accuracy:.4f}")
print(f"Precision: {precision:.4f}")
print(f"Recall: {recall:.4f}")
print(f"F1 Score: {f1:.4f}")

# Display confusion matrix
disp = ConfusionMatrixDisplay(confusion_matrix=cm)
disp.plot(cmap="Blues")

<p style="background-color: lightgreen; text-align: center; font-size: 18px; color: red; padding: 5px; border-radius: 10px;"><b>Exercise 3</b></p>

1. **Load Data:** Load the iris flower dataset from the file `IrisFlower.csv`.

2. **Data Preprocessing:** Scale the inputs and outputs with min-max scaler.

3. **Model Training and Evaluation:** Train a MLP network with a structure of your choice and evaluate its performance in predicting iris flower species.

4. **Change Model Parameters:** Try again with different MLP architectures, explorer the effects of number of epochs and learning rate on model performance and training time.