## **#07: Deep Learning Advanced: Hyperparameters**
- Instructor: [Jaeung Sim](https://jaeungs.github.io/) (University of Connecticut)
- Course: OPIM 5512 Data Science Using Python
- Last updated: March 12, 2025

**Objectives**
1. Implement `tensorflow` to build and train a basic deep learning model.
1. Revise hyperparameters to improve model performances.

**References**
* [Deep Learning Basics by Google Colab](https://colab.research.google.com/github/lexfridman/mit-deep-learning/blob/master/tutorial_deep_learning_basics/deep_learning_basics.ipynb)
* [TensorFlow - Single Layer Perceptron by Tutorials Point](https://www.tutorialspoint.com/tensorflow/tensorflow_single_layer_perceptron.htm)
* [TensorFlow - Multi-Layer Perceptron Learning by Tutorials Point](https://www.tutorialspoint.com/tensorflow/tensorflow_multi_layer_perceptron_learning.htm)
* [TensorFlow - Keras by Tutorials Point](https://www.tutorialspoint.com/tensorflow/tensorflow_keras.htm)

#### **Part 0. Setup and Data Exploration**

In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense

# Commonly used modules
import numpy as np
import os
import sys

# Images, plots, display, and visualization
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import cv2
import IPython
from six.moves import urllib

print(tf.__version__)

We are going to use the Boston housing dataset, which has 506 rows of data, with 13 features in each. Our task is to build a regression model that takes these 13 features as input and output a single value prediction of the "median value of owner-occupied homes (in $1000)."

Now, we load the dataset. Loading the dataset returns four NumPy arrays:
* The `train_features` and `train_labels` arrays are the *training set*—the data the model uses to learn.
* The model is tested against the *test set*, the `test_features`, and `test_labels` arrays.

In [None]:
# Load the dataset
(train_features, train_labels), (test_features, test_labels) = keras.datasets.boston_housing.load_data()

In [None]:
# Explore the shapes of the datasets
print("Shape of Train Features: ", train_features.shape)
print("Shape of Train Labels: ", train_labels.shape)
print("Shape of Test Features: ", test_features.shape)
print("Shape of Test Label: ", test_labels.shape)

In [None]:
# Get per-feature statistics (mean, standard deviation) from the training set to normalize by
train_mean = np.mean(train_features, axis=0)
train_std = np.std(train_features, axis=0)
train_features = (train_features - train_mean) / train_std

#### **Part 1. Single Layer Perceptron**

Single layer perceptron is the first proposed neural model created. The content of the local memory of the neuron consists of a vector of weights. The computation of a single layer perceptron is performed over the calculation of sum of the input vector each with the value multiplied by corresponding element of vector of the weights. The value which is displayed in the output will be the input of an activation function.

![image](https://www.tutorialspoint.com/tensorflow/images/single_layer_perceptron.jpg)

Now, let us consider the following basic steps of model training:
* The weights are initialized with random values at the beginning of the training.
* For each element of the training set, the error is calculated with the difference between desired output and the actual output. The error calculated is used to adjust the weights.
* The process is repeated until the error made on the entire training set is not less than the specified threshold, until the maximum number of iterations is reached.

**A. Build and Train the Model**

Building the neural network requires configuring the layers of the model, then compiling the model. First, we stack a few layers together using `keras.Sequential`. Next, we configure the loss function, optimizer, and metrics to monitor. These are added during the model's compile step:

* *Loss function* - measures how accurate the model is during training, we want to minimize this with the optimizer.
* *Optimizer* - how the model is updated based on the data it sees and its loss function.
* *Metrics* - used to monitor the training and testing steps.

Here, we will train a single layer perceptron, which does not contain any hidden layer as follows:

In [None]:
# Define the model architecture
model = tf.keras.Sequential([
    tf.keras.layers.Dense(1, input_dim=13, activation='relu') # Single layer, 13 input dimensions, sigmoid activation function
])

In [None]:
# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

In [None]:
# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

**B. Model Assessment**

In [None]:
# Standardize test features
test_features_std = (test_features - train_mean) / train_std

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

In [None]:
# To save the performance
performance = {}
performance['Single'] = loss

#### **Part 2. Multi-Layer Perceptron Learning**

Multi-Layer perceptron defines the most complicated architecture of artificial neural networks. It is substantially formed from multiple layers of perceptron.

The diagrammatic representation of multi-layer perceptron learning is as shown below:

![image](https://www.tutorialspoint.com/tensorflow/images/multi_layer_perceptron.jpg)

**A. Build and Train the Model with 1 Hidden Layer**

Let's build a network with 1 hidden layer of 20 neurons, and use mean squared error (MSE) as the loss function (most common one for regression problems):

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(1)
])

In [None]:
# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

In [None]:
# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

**B. Model Assessment**

In [None]:
# Standardize test features
test_features_std = (test_features - train_mean) / train_std

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

In [None]:
# To save the performance
performance['Multi 1'] = loss

**C. Larger Number of Hidden Layers**

(1) Repeat the process for 2 hidden layers

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]), # One more hidden layer
        Dense(1)
])

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
performance['Multi 2'] = loss

(2) Repeat the process for 3 hidden layers

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]), # Two more hidden layers
        Dense(1)
])

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
performance['Multi 3'] = loss

**D. Model Comparison**

In [None]:
# Saved results (loss / mae / mse)
performance

In [None]:
# Create a list of labels for the x-axis
labels = ['Loss', 'MAE', 'MSE']

# Create a figure and axis object
fig, ax = plt.subplots()

# Set the bar width
bar_width = 0.2

# Create a list of indices for each bar
index = np.arange(len(labels))

# Create the bars
single = ax.bar(index, performance['Single'], bar_width, label='Single')
multi1 = ax.bar(index+bar_width, performance['Multi 1'], bar_width, label='Multi 1')
multi2 = ax.bar(index+2*bar_width, performance['Multi 2'], bar_width, label='Multi 2')
multi3 = ax.bar(index+3*bar_width, performance['Multi 3'], bar_width, label='Multi 3')

# Set the x-axis labels and title
ax.set_xlabel('Metrics')
ax.set_ylabel('Performance')
ax.set_title('Performance by Model')

# Set the tick labels and position for the x-axis
ax.set_xticks(index + 1.5*bar_width)
ax.set_xticklabels(labels)

# Add a legend
ax.legend()

# Display the plot
plt.show()

#### **Part 3. Experimenting with Hyperparameters**

Your decisions on hyperparameters include:
* Number of nodes/neurons
* Number of hidden layers
* Activation functions
* Learning rates
* Epochs
* Batch sizes
* Dropout

Let's start with the model with 3 hidden layers, which performed the best among the candidates.

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]), # Two more hidden layers
        Dense(1)
])

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
new_performance = {}
new_performance['Baseline'] = loss

As you can see from the above codes, you can easily revise number of nodes, hidden layers, activation functions, epochs, and batch sizes. Thus, here we focus on learning rates and dropout.

**A. Learning Rates**

The default learning rate for most optimizers in Keras (such as the Adam optimizer) is 0.001.

Let's try 1) a smaller rate (0.0005), and 2) a larger rate (0.005).

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]), # Two more hidden layers
        Dense(1)
])

# Define the optimizer with a learning rate of 0.0005
opt = keras.optimizers.Adam(learning_rate=0.0005)

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer=opt, metrics=['mae','mse']) # Use the revised optimizer

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
new_performance['Learning 0.0005'] = loss

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]), # Two more hidden layers
        Dense(1)
])

# Define the optimizer with a learning rate of 0.005
opt = keras.optimizers.Adam(learning_rate=0.005)

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer=opt, metrics=['mae','mse']) # Use the revised optimizer

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
new_performance['Learning 0.005'] = loss

**B. Dropout**

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dropout(0.2), # Dropout rates of 20%
        Dense(20, activation=tf.nn.relu),
        Dropout(0.2), # Dropout rates of 20%
        Dense(20, activation=tf.nn.relu),
        Dropout(0.2), # Dropout rates of 20%
        Dense(1)
])

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
new_performance['Dropout 0.2'] = loss

In [None]:
# Define the model architecture
model = keras.Sequential([
        Dense(20, activation=tf.nn.relu, input_shape=[len(train_features[0])]),
        Dropout(0.4), # Dropout rates of 40%
        Dense(20, activation=tf.nn.relu),
        Dropout(0.4), # Dropout rates of 40%
        Dense(20, activation=tf.nn.relu),
        Dropout(0.4), # Dropout rates of 40%
        Dense(1)
])

# Compile the model with mean absolute error as the loss function and the Adam optimizer
model.compile(loss='mae', optimizer='adam', metrics=['mae','mse'])

# Train the model with a batch size of 200 and 100 epochs
model.fit(train_features, train_labels, batch_size=200, epochs=100)

In [None]:
# Evaluate the model with standardized features
loss = model.evaluate(test_features_std, test_labels)
print('Loss:', loss)

# To save the performance
new_performance['Dropout 0.4'] = loss

**C. Model Comparison**

In [None]:
# Saved results (loss / mae / mse)
new_performance

In [None]:
# Create a list of labels for the x-axis
labels = ['Loss', 'MAE', 'MSE']

# Create a figure and axis object
fig, ax = plt.subplots()

# Set the bar width
bar_width = 0.15

# Create a list of indices for each bar
index = np.arange(len(labels))

# Create the bars
ax1 = ax.bar(index, new_performance['Baseline'], bar_width, label='Baseline')
ax2 = ax.bar(index+bar_width, new_performance['Learning 0.0005'], bar_width, label='Learning 0.0005')
ax3 = ax.bar(index+2*bar_width, new_performance['Learning 0.005'], bar_width, label='Learning 0.005')
ax4 = ax.bar(index+3*bar_width, new_performance['Dropout 0.2'], bar_width, label='Dropout 0.2')
ax4 = ax.bar(index+4*bar_width, new_performance['Dropout 0.4'], bar_width, label='Dropout 0.4')

# Set the x-axis labels and title
ax.set_xlabel('Metrics')
ax.set_ylabel('Performance')
ax.set_title('Performance by Model')

# Set the tick labels and position for the x-axis
ax.set_xticks(index + 1.5*bar_width)
ax.set_xticklabels(labels)

# Add a legend
ax.legend()

# Display the plot
plt.show()