<a href="https://colab.research.google.com/github/haoming150ty/Personal-Portfolio/blob/main/Assignment_02_Haoming_Zhang.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Assignment 02: Improving Deep Learning Models (100 pts)**
- Instructor: [Jaeung Sim](https://jaeungs.github.io/) (University of Connecticut)
- Course: OPIM 5512 Data Science Using Python
- Release Date: March 13 (Thu), 2025
- Submission Due: March 28 (Fri), 2025

**Objectives**
1. Experiment with hyperparameters and observe how model performances change.
1. Improve the model performance by tuning hyperparameters.
1. Obtain the loss as low as possible.

## **Introduction**

* You're going to experiment with hyperparameters discussed in class. There could be several unexpected consequences of changing hyperparameters, so you need to be open-minded to find breakthroughs.

* You will follow each step, run your own code, and submit the notebook file on HuskyCT. The maximum score is 100, and you don't have extra points in this assignment.

* At this time, **you can't get help from your friends.** You're on your own. You may still use online documents and artificial intelligence (e.g., ChatGPT). **Importantly, the instructor will receive clarification only.** I will not review and comment on your Python codes.

#### **Step 1. Data Processing**

##### **A. Setup and Data Loading**

You may change the codes here to adapt your environment. **But you should not change the name of the DataFrame.**

In [None]:
# Set your Google Drive directory
import os
os.getcwd()

from google.colab import drive
drive.mount('/content/drive')

os.chdir('/content/drive/My Drive/OPIM 5512') # You may need to change this directory

Mounted at /content/drive


In [None]:
# TensorFlow and tf.keras
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Dropout, Flatten, Dense

# Commonly used modules
import numpy as np
import os
import sys

# Images, plots, display, and visualization
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
import cv2
import IPython
from six.moves import urllib

print(tf.__version__)

2.18.0


Please download `dataset_assignment_02.csv` from HuskyCT and place it in your working directory. This dataset, generated by the instructor (**Jaeung Sim**), includes one target variable ("Y") and 15 input variables. Due to its data generation process, you may not have to either standardize or normalize input variables.

In [None]:
# Data Loading
df = pd.read_csv('dataset_assignment_02.csv')

In [None]:
# Explore the head of the dataset
df.head(20)

Unnamed: 0,y,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
0,59.064028,0.092627,-0.616344,1.01023,2.174466,1,0.997493,-0.536556,0.714396,0.485646,1,0.218922,0.022818,0.344689,0.08816,0.083429
1,-104.243187,0.360192,-0.398255,0.255686,0.965165,1,-0.55148,0.904158,0.489067,0.069215,1,0.072969,0.049336,-0.385529,0.031542,0.391962
2,92.45651,0.35124,0.440209,0.391802,0.996526,1,-0.995572,0.724013,0.280797,0.069975,0,0.04118,0.154431,0.259341,-0.002101,0.3548
3,10.851567,0.680396,0.235624,1.303907,1.103386,0,0.547052,0.9959,0.367699,-0.886281,0,0.269877,0.098021,-0.938547,0.042472,0.266599
4,210.625565,0.369461,-0.825356,2.781261,1.70167,1,0.827293,-0.947857,0.207226,0.039131,1,0.435014,0.275365,0.100224,0.049743,0.034647
5,108.686823,0.606444,-1.056586,0.032153,1.842594,1,-0.985843,-0.808329,0.893509,0.565457,0,0.290238,0.024151,-0.751336,0.087975,0.143585
6,74.479498,0.449913,1.052614,0.375435,1.452903,1,-0.976122,-0.425896,0.289792,0.661508,0,0.047633,0.034497,0.496091,0.031193,0.34772
7,-18.320115,0.634891,-0.310191,1.433664,0.931551,1,0.976391,-0.98962,0.602963,-0.305327,0,0.346582,0.022114,-0.610197,0.078646,0.08252
8,11.955305,0.412335,-0.281695,1.528866,1.068423,0,0.809005,-0.378763,0.251986,0.372649,0,0.535455,0.197533,-0.307524,0.09725,0.097163
9,82.472151,0.256528,-1.114048,0.245172,1.11612,1,0.870437,-0.891208,0.904717,0.881962,1,0.346526,0.002445,-0.417299,0.086395,0.373682


In [None]:
# Explore the tail of the dataset
df.tail(20)

Unnamed: 0,y,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
29980,-122.702466,0.39784,0.900086,0.31485,2.156017,1,0.967307,0.938976,0.632114,-0.449977,1,0.234712,0.055163,0.604103,0.061053,0.529036
29981,-57.365211,0.45771,-1.287472,1.781938,1.937985,1,0.708915,0.974994,0.790703,1.133278,1,0.642051,0.513329,-0.619176,0.040939,0.09382
29982,44.435817,0.51516,-0.83397,0.669078,1.428992,1,-0.391362,0.979343,0.951212,0.278342,0,0.210977,0.298946,-0.612391,0.003103,0.527915
29983,-8.196866,0.646681,-0.74736,0.226113,1.56226,0,0.773139,-0.632539,0.503871,1.117907,0,0.592148,0.236186,-0.960458,0.096208,0.133148
29984,-3.291102,0.163411,0.630628,0.827184,2.078133,0,-0.990036,0.760767,0.286843,0.024226,1,0.832941,0.072299,0.45891,0.223716,0.60967
29985,52.195376,0.952352,0.759569,1.061226,2.111333,1,0.094293,0.69982,0.505316,-0.460673,0,0.50776,0.073371,-0.238914,0.118022,0.000673
29986,51.846094,0.177249,-0.327842,2.434835,1.53854,0,0.375706,0.523736,0.462695,-1.00052,0,0.307139,0.02537,0.139249,0.106558,0.21037
29987,11.326324,0.425648,0.023454,2.750629,1.973477,1,0.179919,0.992137,0.395314,-0.930661,1,0.095678,0.074353,-0.674701,0.042879,0.828106
29988,2.390723,0.850333,-0.97998,0.728154,1.28859,1,-0.855222,-0.019404,0.315912,-0.552394,0,0.568589,0.040785,0.345126,0.153801,0.014149
29989,-3.363532,0.054815,-0.194828,0.398727,1.180171,1,0.447602,0.115843,0.148251,-1.240882,1,0.767536,0.009179,-0.388533,0.197059,0.599797


In [None]:
# Explore the shape of the dataset
print("Shape of Raw Dataset: ", df.shape)

Shape of Raw Dataset:  (30000, 16)


##### **B. Data Partition**

In this part, you will divide the raw dataset into training and test sets. To ensure trackability and replicability, **you should run the codes in this section without revision.**

In [None]:
# Import 'sklearn' functions
from sklearn.model_selection import train_test_split

In [None]:
# Set Y variable
Y = df["y"]
Y

Unnamed: 0,y
0,59.064028
1,-104.243187
2,92.456510
3,10.851567
4,210.625565
...,...
29995,119.097917
29996,-13.577414
29997,88.986861
29998,44.554191


In [None]:
# Set X variable
X = df
X.pop("y")
X

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
0,0.092627,-0.616344,1.010230,2.174466,1,0.997493,-0.536556,0.714396,0.485646,1,0.218922,0.022818,0.344689,0.088160,0.083429
1,0.360192,-0.398255,0.255686,0.965165,1,-0.551480,0.904158,0.489067,0.069215,1,0.072969,0.049336,-0.385529,0.031542,0.391962
2,0.351240,0.440209,0.391802,0.996526,1,-0.995572,0.724013,0.280797,0.069975,0,0.041180,0.154431,0.259341,-0.002101,0.354800
3,0.680396,0.235624,1.303907,1.103386,0,0.547052,0.995900,0.367699,-0.886281,0,0.269877,0.098021,-0.938547,0.042472,0.266599
4,0.369461,-0.825356,2.781261,1.701670,1,0.827293,-0.947857,0.207226,0.039131,1,0.435014,0.275365,0.100224,0.049743,0.034647
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
29995,0.682372,1.181348,1.002310,2.141294,0,-0.894601,-0.918162,0.309426,-0.595277,0,0.456331,0.315299,-0.013728,0.072217,0.296694
29996,0.441985,0.210855,0.626278,1.364052,0,-0.954581,0.997034,0.064649,0.404951,1,0.126409,0.077912,-0.131121,0.016440,0.036783
29997,0.890115,1.652265,0.258011,1.145487,0,-0.811441,-0.511596,0.364253,-0.121384,0,0.036219,0.042292,0.739291,0.045795,0.904232
29998,0.123513,0.482803,0.564734,1.439070,0,-0.364648,-0.473517,0.170413,0.177105,0,0.113555,0.112393,-0.580274,0.050031,0.797004


In [None]:
# Sample a training set while holding out 30% of the data for testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=1234)

In [None]:
# Check the matrix shapes
print("Shape of Train Set (X, Y):", X_train.shape, Y_train.shape)
print("Shape of Test Set (X, Y):", X_test.shape, Y_test.shape)

Shape of Train Set (X, Y): (21000, 15) (21000,)
Shape of Test Set (X, Y): (9000, 15) (9000,)


In [None]:
# Explore the DataFrames
X_train.head(10)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
5030,0.830337,-1.34863,1.505998,1.24394,0,-0.063637,0.298929,0.547228,0.980524,0,0.351949,0.134515,0.405376,0.080053,0.433474
4239,0.175136,-0.830026,1.17223,2.86448,1,0.69479,1.0,0.511635,-1.026502,0,0.253371,0.0202,-0.48766,0.068492,0.767384
14120,0.310286,-1.399365,0.6552,2.101705,0,-0.671872,-0.450896,0.127184,-0.317015,0,0.045418,0.12107,-0.999891,-0.015662,0.178232
13866,0.545079,0.740424,0.010513,1.768809,0,-0.545971,0.542386,0.139906,0.064852,1,0.411732,0.034865,-0.127736,0.106032,0.303804
20220,0.078898,0.456369,0.1543,0.874005,0,-0.884583,0.971968,0.962437,-0.786529,1,0.296597,0.061989,-0.332989,0.061222,0.566569
15724,0.522498,0.976123,0.837396,1.89755,1,0.01728,0.688607,0.075104,0.054994,1,0.015088,0.160762,-0.36431,-0.005234,0.118435
26058,0.309874,0.153299,0.038736,1.916747,1,0.769164,0.550701,0.37176,-0.878779,1,0.193955,0.005066,0.19567,0.089964,0.094089
26665,0.235744,1.018249,2.002586,2.201617,1,0.108409,0.408213,0.828896,-0.473336,1,0.868925,0.295895,-0.864794,0.163245,0.340676
21059,0.948247,0.342384,1.187768,2.080069,0,-0.562353,0.991227,0.294229,1.018352,0,0.328291,0.00461,-0.534615,0.099731,0.000199
7788,0.886802,0.502829,0.34704,1.07062,0,-0.227978,0.500825,0.441789,-0.386195,0,0.628746,0.015683,-0.266532,0.165394,0.892846


In [None]:
Y_train.head(10)

Unnamed: 0,y
5030,34.735132
4239,44.291555
14120,31.491459
13866,-8.683834
20220,-89.197741
15724,38.67313
26058,-74.111956
26665,89.157506
21059,62.137223
7788,41.608962


In [None]:
X_test.head(10)

Unnamed: 0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15
13125,0.986433,0.101937,0.159031,1.970251,0,0.974038,-0.131043,0.16442,0.653012,0,0.265201,0.04362,-0.782142,0.07763,0.033281
14635,0.767478,-0.431052,3.135878,1.933838,1,-0.249632,0.948764,0.005937,-0.751717,0,0.110173,0.067072,0.154542,0.030345,0.016297
19429,0.352949,0.842864,1.671222,1.438546,0,-0.693327,0.892089,0.123208,-1.294293,0,0.856533,0.055441,0.775242,0.258909,0.577208
4381,0.237681,-0.685494,1.914837,1.570059,1,-0.264918,0.496601,0.23277,-0.116019,0,0.48607,0.029283,-0.320918,0.148097,0.025689
7659,0.141268,0.058747,3.025678,1.935007,1,0.65329,0.198428,0.296989,-1.250158,1,0.033149,0.351685,0.175874,-0.045462,0.522686
10637,0.186597,0.445117,1.101864,0.869283,1,-0.091805,0.999073,0.761553,0.232988,1,0.322511,0.356911,0.296305,0.03626,0.088549
17045,0.52321,0.371765,0.315079,2.038442,1,0.95496,-0.342773,0.968143,0.838122,1,0.317342,0.344666,-0.474761,0.039049,0.296147
24978,0.287102,-0.635989,0.032661,1.49541,1,0.489368,-0.478478,0.446252,-1.130273,1,0.092086,0.081887,-0.496986,0.004575,0.097796
13639,0.675054,-0.472454,0.210905,1.562585,1,-0.997666,0.881135,0.969532,0.109907,0,0.847673,0.079667,-0.168957,0.215391,0.622419
9468,0.493854,0.209625,1.295365,1.864519,1,-0.401776,-0.984597,0.057949,-1.037027,1,0.332311,0.141056,-0.909323,0.067077,0.368561


In [None]:
Y_test.head(10)

Unnamed: 0,y
13125,16.084036
14635,73.100874
19429,88.114504
4381,128.835435
7659,103.259832
10637,-51.006675
17045,92.479078
24978,95.867235
13639,71.900774
9468,193.957762


#### **Step 2. Deep Learning (100 pts)**

Here, you're going to experiment with several hyperparameters to improve the predictive performance of your deep learning model. In the following sections, you will practice:
* Number of Nodes and Hidden Layers
* Activation Functions
* Learning Rates
* Epochs
* Batch Sizes
* Dropout Rates

##### **A. Number of Nodes and Hidden Layers (25 pts)**

**Question A1. Build, train, and assess a neural network model with 1 hidden layer of 20 neurons.** In doing so, you should use all of the 15 input variables, a ReLU activation function, and mean absolute errors (MAE) as your loss function. You should set a batch size as 1000, epochs as 200, a learning rate as 0.001, the Adam algorithm for your optimizer, and no dropout.

In [None]:
### To be coded by students ###
model = keras.Sequential([
    Dense(20, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])
# Compile the model
from tensorflow.keras.optimizers import Adam
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 4ms/step - loss: 65.0967 - mae: 65.0967 - mse: 6801.5859
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 65.0803 - mae: 65.0803 - mse: 6789.9204
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 64.5572 - mae: 64.5572 - mse: 6705.0356
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 64.0076 - mae: 64.0076 - mse: 6588.8828 
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 63.6641 - mae: 63.6641 - mse: 6495.5859 
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 62.9469 - mae: 62.9469 - mse: 6402.8853 
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 62.1459 - mae: 62.1459 - mse: 6225.5542
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m 

<keras.src.callbacks.history.History at 0x7aa3c402d190>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_A = {}
performance_A['(20 x 1)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 31.6986 - mae: 31.6986 - mse: 1676.9381
Loss: [32.059452056884766, 32.059452056884766, 1678.92333984375]


**Question A2. Build, train, and assess a neural network model with 3 hidden layers, 30 neurons for each.** In doing so, you should use all of the 15 input variables, a ReLU activation function, and mean absolute errors (MAE) as your loss function. You should set a batch size as 1000, epochs as 200, a learning rate as 0.001, the Adam algorithm for your optimizer, and no dropout.

In [None]:
### To be coded by students ###
import random
random.seed(1234)
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - loss: 65.4378 - mae: 65.4378 - mse: 6860.2656
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 64.7059 - mae: 64.7059 - mse: 6720.6636
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 63.5633 - mae: 63.5633 - mse: 6483.6602
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 60.6447 - mae: 60.6447 - mse: 5960.5986
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 54.4026 - mae: 54.4026 - mse: 4848.0449
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 51.9513 - mae: 51.9513 - mse: 4460.0005
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 50.3650 - mae: 50.3650 - mse: 4146.2339
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/s

<keras.src.callbacks.history.History at 0x7aa3c01e3d10>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_A['(30 x 3)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.4648 - mae: 28.4648 - mse: 1288.1936
Loss: [28.562971115112305, 28.562971115112305, 1281.6761474609375]


##### **B. Activation Functions (25 pts)**

**Question B1. You're going to make only one change to the model in Question A2. Please replace the current activation function (ReLU) with 1) the sigmoid function and 2) the softplus function** (You should train the model with sigmoid and the one with softplus separately)**. Do you find any improvement compared to the model in Question A2?**

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.sigmoid, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.sigmoid, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.sigmoid, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 9ms/step - loss: 65.7002 - mae: 65.7002 - mse: 6869.1270
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 64.8035 - mae: 64.8035 - mse: 6737.1187
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 64.5445 - mae: 64.5445 - mse: 6713.4424
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 63.7187 - mae: 63.7187 - mse: 6528.5703
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/step - loss: 63.9534 - mae: 63.9534 - mse: 6579.0415
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 63.1046 - mae: 63.1046 - mse: 6428.0796
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 63.1881 - mae: 63.1881 - mse: 6448.0063
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/s

<keras.src.callbacks.history.History at 0x7aa3c36ccb10>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_B = {}
performance_B['Sigmoid'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 38.6833 - mae: 38.6833 - mse: 2638.3806
Loss: [38.52397918701172, 38.52397918701172, 2586.60107421875]


In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.softplus, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.softplus, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.softplus, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - loss: 65.1390 - mae: 65.1390 - mse: 6781.8530
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 63.8832 - mae: 63.8832 - mse: 6617.4653
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 61.7343 - mae: 61.7343 - mse: 6210.5542
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 57.6499 - mae: 57.6499 - mse: 5486.0322
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 53.9363 - mae: 53.9363 - mse: 4831.5796
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 53.2481 - mae: 53.2481 - mse: 4660.3535
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 53.1722 - mae: 53.1722 - mse: 4688.6074
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/s

<keras.src.callbacks.history.History at 0x7aa3bb530d90>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_B['Softplus'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.5318 - mae: 28.5318 - mse: 1296.5765
Loss: [28.669757843017578, 28.669757843017578, 1296.729248046875]


In [None]:
# FYI: Model Performances in Section A
print(performance_A['(20 x 1)'])
print(performance_A['(30 x 3)'])
print(performance_B['Sigmoid'])
print(performance_B['Softplus'])

[32.059452056884766, 32.059452056884766, 1678.92333984375]
[28.562971115112305, 28.562971115112305, 1281.6761474609375]
[38.52397918701172, 38.52397918701172, 2586.60107421875]
[28.669757843017578, 28.669757843017578, 1296.729248046875]


**Do you find any improvement compared to the model in Question A2?**

* **Your answer:**
We can see the comparison of three different activation functions from this step. It's clear that relu is the best performing function out of the 3 choices because it provides the lowest mse value (A2 model). Softplus model comes second for silghtly higher values for all parameters.

**Question B2. Revise the model in Question B1 as follows. For the first/second/third layers, use the activation function sigmoid/softplus/relu, respectively. Do you find any improvement compared to the models in Question A2 and Question B1?**

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.sigmoid, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.softplus, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m3s[0m 8ms/step - loss: 65.7949 - mae: 65.7949 - mse: 6928.3809
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 65.0175 - mae: 65.0175 - mse: 6787.8281
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 63.5190 - mae: 63.5190 - mse: 6548.0864
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 61.7762 - mae: 61.7762 - mse: 6211.4575
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 58.9965 - mae: 58.9965 - mse: 5740.5859
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 55.6399 - mae: 55.6399 - mse: 5099.9839
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 54.9506 - mae: 54.9506 - mse: 5021.7783
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/s

<keras.src.callbacks.history.History at 0x7aa3bb537350>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_B['Sigmoid/Softplus/ReLU'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.5923 - mae: 28.5923 - mse: 1311.2065
Loss: [28.712177276611328, 28.712177276611328, 1303.4306640625]


In [None]:
# FYI: Model Performances in Section A
print(performance_A['(20 x 1)'])
print(performance_A['(30 x 3)'])
print(performance_B['Sigmoid'])
print(performance_B['Softplus'])
print(performance_B['Sigmoid/Softplus/ReLU'])

[32.059452056884766, 32.059452056884766, 1678.92333984375]
[28.562971115112305, 28.562971115112305, 1281.6761474609375]
[38.52397918701172, 38.52397918701172, 2586.60107421875]
[28.669757843017578, 28.669757843017578, 1296.729248046875]
[28.712177276611328, 28.712177276611328, 1303.4306640625]


**Do you find any improvement compared to the models in Question A2 and Question B1?**

* **Your answer:**
We see that the model from A2, namely with 3 hidden layers all using relu activation function has the lowest mse value, which substantiates why relu is the most common activation function in deep learning models. It's also noteworthy that sigmoid/softplus/relu model provides decent result, suggesting that a mixture of different activation functions could somehow neutralize the limitation of differet function and aggregate their strength.

##### **C. Learning Rates, Epochs, and Batch Sizes (25 pts)**

**Question C1. You're going to make only one change to the model in Question A2. Please train and assess four models, each of which has (learning_rates, batch_size, epochs) combination (0.01, 1000, 200), (0.001, 1000, 400), (0.01, 1000, 400), and (0.01, 500, 400), respectively. Do you find any improvement compared to the model in Question A2?**

1) (learning_rates, batch_size, epochs) = (0.01, 1000, 200)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.01)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - loss: 62.7873 - mae: 62.7873 - mse: 6397.7202
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 51.0512 - mae: 51.0512 - mse: 4294.3447
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 45.0103 - mae: 45.0103 - mse: 3347.3474
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 41.9882 - mae: 41.9882 - mse: 2871.1008
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 36.9749 - mae: 36.9749 - mse: 2311.2424
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 35.4480 - mae: 35.4480 - mse: 2180.1990
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 34.2590 - mae: 34.2590 - mse: 1987.3632
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/s

<keras.src.callbacks.history.History at 0x7aa3bb1a3850>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_C = {}
performance_C['(0.01, 1000, 200)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.8206 - mae: 28.8206 - mse: 1294.2103
Loss: [28.880550384521484, 28.880550384521484, 1300.154541015625]


2) (learning_rates, batch_size, epochs) = (0.001, 1000, 400)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=400, verbose=1)

Epoch 1/400


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 10ms/step - loss: 65.8838 - mae: 65.8838 - mse: 6967.3667
Epoch 2/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 65.4157 - mae: 65.4157 - mse: 6870.7520
Epoch 3/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 63.2945 - mae: 63.2945 - mse: 6445.7310
Epoch 4/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 59.8210 - mae: 59.8210 - mse: 5789.9233
Epoch 5/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 54.4834 - mae: 54.4834 - mse: 4908.5508
Epoch 6/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 52.0604 - mae: 52.0604 - mse: 4501.8638
Epoch 7/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 8ms/step - loss: 50.2769 - mae: 50.2769 - mse: 4190.3413
Epoch 8/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 8ms/

<keras.src.callbacks.history.History at 0x7aa3b9c82f50>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_C['(0.001, 1000, 400)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.4781 - mae: 28.4781 - mse: 1280.6718
Loss: [28.607107162475586, 28.607107162475586, 1280.333984375]


3) (learning_rates, batch_size, epochs) = (0.01, 1000, 400)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.01)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=1000, epochs=400, verbose=1)

Epoch 1/400


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 63.5574 - mae: 63.5574 - mse: 6504.8867
Epoch 2/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 50.5745 - mae: 50.5745 - mse: 4210.4399
Epoch 3/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 45.5919 - mae: 45.5919 - mse: 3380.5046
Epoch 4/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 43.0284 - mae: 43.0284 - mse: 3021.0872
Epoch 5/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 39.2951 - mae: 39.2951 - mse: 2585.5027
Epoch 6/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 35.3877 - mae: 35.3877 - mse: 2153.2561
Epoch 7/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 32.5321 - mae: 32.5321 - mse: 1772.3903
Epoch 8/400
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/s

<keras.src.callbacks.history.History at 0x7aa3b9b9ebd0>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_C['(0.01, 1000, 400)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.8260 - mae: 28.8260 - mse: 1299.7816
Loss: [28.883014678955078, 28.883014678955078, 1301.1168212890625]


4) (learning_rates, batch_size, epochs) = (0.01, 500, 400)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.01)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=500, epochs=400, verbose=1)

Epoch 1/400


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 3ms/step - loss: 60.5580 - mae: 60.5580 - mse: 5970.5229
Epoch 2/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 44.6526 - mae: 44.6526 - mse: 3277.0359
Epoch 3/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 36.0902 - mae: 36.0902 - mse: 2167.1453
Epoch 4/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 31.9141 - mae: 31.9141 - mse: 1675.2640
Epoch 5/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 31.1522 - mae: 31.1522 - mse: 1599.3044
Epoch 6/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 30.3142 - mae: 30.3142 - mse: 1493.3417
Epoch 7/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 29.9169 - mae: 29.9169 - mse: 1445.6031
Epoch 8/400
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/s

<keras.src.callbacks.history.History at 0x7aa3b9c42150>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_C['(0.01, 500, 400)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 29.5365 - mae: 29.5365 - mse: 1375.1532
Loss: [29.598207473754883, 29.598207473754883, 1376.0191650390625]


**Do you find any improvement compared to the model in Question A2?**

* **Your answer:** From the comparison below, after changing the combination of learning rate, batch size, and epochs, we can see that the combination of (0.001, 1000, 400) returned the lowest mse value and, considering that loss and mae are close among all 4, the lowest mse becomes the deciding factor. Thus, (0.001, 1000, 400) is the best performing model.

**(In the question C2, I changed the combination to (0.001, 500, 300), which returned a mse of mere 1261, becoming the best performing model)

In [None]:
# FYI: Model Performances in Section A
performance_A

{'(20 x 1)': [32.059452056884766, 32.059452056884766, 1678.92333984375],
 '(30 x 3)': [28.562971115112305, 28.562971115112305, 1281.6761474609375]}

In [None]:
# FYI: Model Performances in Section C
print(performance_C['(0.001, 1000, 400)'])
print(performance_C['(0.01, 1000, 200)'])
print(performance_C['(0.01, 1000, 400)'])
print(performance_C['(0.01, 500, 400)'])

[28.607107162475586, 28.607107162475586, 1280.333984375]
[28.880550384521484, 28.880550384521484, 1300.154541015625]
[28.883014678955078, 28.883014678955078, 1301.1168212890625]
[29.598207473754883, 29.598207473754883, 1376.0191650390625]


**Question C2. Find any model that performs better than all the models in Question A2 and Question C1 by chainging (learning_rates, batch_size, epochs) combination only.**

In [None]:
### [One Example for Students] ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])

# Compile the model
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])

# Train the model
model.fit(X_train, Y_train, batch_size=500, epochs=300, verbose=1)

Epoch 1/300


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 4ms/step - loss: 65.7770 - mae: 65.7770 - mse: 6935.0376
Epoch 2/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 62.6482 - mae: 62.6482 - mse: 6389.0435
Epoch 3/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 54.1923 - mae: 54.1923 - mse: 4844.1606
Epoch 4/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 4ms/step - loss: 49.8040 - mae: 49.8040 - mse: 4073.0830
Epoch 5/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 47.1292 - mae: 47.1292 - mse: 3609.6509
Epoch 6/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 45.0039 - mae: 45.0039 - mse: 3260.9558
Epoch 7/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step - loss: 43.7430 - mae: 43.7430 - mse: 3070.0986
Epoch 8/300
[1m42/42[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/s

<keras.src.callbacks.history.History at 0x7aa3aded75d0>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_C['(0.001, 500, 300)'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.3161 - mae: 28.3161 - mse: 1261.6375
Loss: [28.4807071685791, 28.4807071685791, 1269.1785888671875]


In [None]:
# FYI: Model Performances in Section A
performance_A

{'(20 x 1)': [32.059452056884766, 32.059452056884766, 1678.92333984375],
 '(30 x 3)': [28.562971115112305, 28.562971115112305, 1281.6761474609375]}

In [None]:
# FYI: Model Performances in Section C
performance_C

{'(0.01, 1000, 200)': [28.880550384521484,
  28.880550384521484,
  1300.154541015625],
 '(0.001, 1000, 400)': [28.607107162475586,
  28.607107162475586,
  1280.333984375],
 '(0.01, 1000, 400)': [28.883014678955078,
  28.883014678955078,
  1301.1168212890625],
 '(0.01, 500, 400)': [29.598207473754883,
  29.598207473754883,
  1376.0191650390625],
 '(0.001, 500, 400)': [28.600372314453125,
  28.600372314453125,
  1281.93505859375],
 '(0.001, 500, 300)': [28.4807071685791, 28.4807071685791, 1269.1785888671875]}

##### **D. Dropout Rates (25 pts)**

**Question D1. You're going to make only one change to the model in Question A2. Please include dropout rates of 10% for each hidden layer. Do you find any improvement compared to the model in Question A2?**

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.1),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.1),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.1),
    Dense(1)  # Output layer
])
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 7ms/step - loss: 65.3223 - mae: 65.3224 - mse: 6827.4106
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 65.0090 - mae: 65.0090 - mse: 6801.2148
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 63.2987 - mae: 63.2987 - mse: 6471.9336
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 58.0690 - mae: 58.0690 - mse: 5535.9028
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 52.6868 - mae: 52.6868 - mse: 4566.3818
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 50.4209 - mae: 50.4209 - mse: 4175.5791
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 48.9200 - mae: 48.9200 - mse: 3937.6699
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 9ms/s

<keras.src.callbacks.history.History at 0x7aa3c350fb50>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_D = {}
performance_D['10%/10%/10%'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.9474 - mae: 28.9474 - mse: 1342.6803
Loss: [29.063486099243164, 29.063486099243164, 1334.5682373046875]


In [None]:
# FYI: Model Performances in Section A
print(performance_A['(30 x 3)'])
print(performance_D['10%/10%/10%'])

[28.562971115112305, 28.562971115112305, 1281.6761474609375]
[29.063486099243164, 29.063486099243164, 1334.5682373046875]


**Do you find any improvement compared to the model in Question A2?**

* **Your answer:** [To be updated]

**Question D2. Try different combinations of dropout rates for hidden layers: (0%, 10%, 0%), (5%, 5%, 5%), (0%, 5%, 0%). Do you find any improvement compared to the models in Question A2 and Question D1?**

1) (0%, 10%, 0%)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.1),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - loss: 65.7564 - mae: 65.7564 - mse: 6922.7852
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 65.7010 - mae: 65.7010 - mse: 6926.1621
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 64.6354 - mae: 64.6354 - mse: 6726.7544
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 61.2646 - mae: 61.2646 - mse: 6091.3638
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 55.1359 - mae: 55.1359 - mse: 4985.9121
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 52.0472 - mae: 52.0472 - mse: 4489.1016
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 51.0626 - mae: 51.0626 - mse: 4360.7749
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/s

<keras.src.callbacks.history.History at 0x7aa3c34f04d0>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_D['0%/10%/0%'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 5ms/step - loss: 28.5629 - mae: 28.5629 - mse: 1288.4950
Loss: [28.683971405029297, 28.683971405029297, 1288.48583984375]


2) (5%, 5%, 5%)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.05),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.05),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.05),
    Dense(1)  # Output layer
])
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


Epoch 1/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m6s[0m 14ms/step - loss: 65.5105 - mae: 65.5105 - mse: 6914.4204
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 15ms/step - loss: 64.7957 - mae: 64.7957 - mse: 6741.3169
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 17ms/step - loss: 62.7421 - mae: 62.7421 - mse: 6315.2354
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 17ms/step - loss: 58.7692 - mae: 58.7692 - mse: 5657.3071
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 7ms/step - loss: 52.5579 - mae: 52.5579 - mse: 4554.3906
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 19ms/step - loss: 50.5613 - mae: 50.5613 - mse: 4185.0430
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 15ms/step - loss: 48.4272 - mae: 48.4272 - mse: 3852.4194
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0

<keras.src.callbacks.history.History at 0x7aa3b973d210>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_D['5%/5%/5%'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 28.7987 - mae: 28.7987 - mse: 1320.7804
Loss: [28.88221549987793, 28.88221549987793, 1313.008056640625]


3) (0%, 5%, 0%)

In [None]:
### To be coded by students ###
random.seed(1234) # For replicability
model = keras.Sequential([
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dropout(0.05),
    Dense(30, activation=tf.nn.relu, input_shape=(X_train.shape[1],)),
    Dense(1)  # Output layer
])
optimizer = Adam(learning_rate=0.001)
model.compile(loss='mae', optimizer=optimizer, metrics=['mae', 'mse'])
model.fit(X_train, Y_train, batch_size=1000, epochs=200, verbose=1)

Epoch 1/200


  super().__init__(activity_regularizer=activity_regularizer, **kwargs)


[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 6ms/step - loss: 65.5276 - mae: 65.5276 - mse: 6927.1704
Epoch 2/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 65.1707 - mae: 65.1707 - mse: 6832.8848
Epoch 3/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 63.9853 - mae: 63.9853 - mse: 6625.1201
Epoch 4/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 61.5950 - mae: 61.5950 - mse: 6142.0889
Epoch 5/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 55.8859 - mae: 55.8859 - mse: 5137.6963
Epoch 6/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/step - loss: 51.8955 - mae: 51.8955 - mse: 4437.0508
Epoch 7/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 6ms/step - loss: 50.1788 - mae: 50.1788 - mse: 4168.3594
Epoch 8/200
[1m21/21[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 5ms/s

<keras.src.callbacks.history.History at 0x7aa3b833ef10>

In [None]:
### Run this code after training your model ###

# Evaluate the model with standardized features
loss = model.evaluate(X_test, Y_test) # You might change the model's name
print('Loss:', loss)

# To save the performance
performance_D['0%/5%/0%'] = loss

[1m282/282[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 28.4385 - mae: 28.4385 - mse: 1280.7993
Loss: [28.600372314453125, 28.600372314453125, 1281.93505859375]


**Do you find any improvement compared to the models in Question A2 and Question D1?**

* **Your answer:**
From the below comparison with the A2 model, we can find that [0%/5%/0%] model has the lowest value for mse. Given that the loss and mae are relatively close among all 4 groups of models, mse can be used as the deciding factor to determine the relatively best performing model. In this case, [0%/5%/0%] model is the best performing model.

In [None]:
# FYI: Model Performances in Section A
performance_A

{'(20 x 1)': [32.059452056884766, 32.059452056884766, 1678.92333984375],
 '(30 x 3)': [28.562971115112305, 28.562971115112305, 1281.6761474609375]}

In [None]:
# FYI: Model Performances in Section D
print(performance_D['10%/10%/10%'])
print(performance_D['0%/10%/0%'])
print(performance_D['5%/5%/5%'])
print(performance_D['0%/5%/0%'])

[29.063486099243164, 29.063486099243164, 1334.5682373046875]
[28.683971405029297, 28.683971405029297, 1288.48583984375]
[28.88221549987793, 28.88221549987793, 1313.008056640625]
[28.600372314453125, 28.600372314453125, 1281.93505859375]
