## Select and analysis dataset

First, we call PreprocessData.select_and_analyze_dataset() to prepare the input dataset and save the train and test data to files.

In [1]:
from PreprocessData import PreprocessData
preprocessData = PreprocessData()
preprocessData.select_and_analyze_dataset()

PreprocessData initialized.
Reading data from ./data/kc_house_data.csv...
Truncating data randomly to 2000 rows
Selecting this columns from the data: ['date', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'lat', 'long', 'price']
Removing missing values from columns: ['date', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'lat', 'long', 'price']
Removing outliers values from columns: []
Splitting data: train_data (1600) and test_data (400)
Creating ColumnTransformer
Fit train_data
Transforming train_data and test_data with columns: '['date__days_since_first_date' 'bedrooms__bedrooms' 'bathrooms__bathrooms'
 'sqft_living__sqft_living' 'sqft_lot__sqft_lot' 'floors__floors_1.0'
 'floors__floors_1.5' 'floors__floors_2.0' 'floors__floors_2.5'
 'floors__floors_3.0' 'floors__floors_3.5' 'waterfront__waterfront_1'
 'view__view_0' 'view__view_1' 'view__vi

## Hyperparameter comparison and selection

We will explore some of the space of hyperparameters, trying different combinations and 
evaluating the quality of the result of the prediction obtained using them.

For that, we load the hyperparameter combinations and the transformed train dataset from files.

In [2]:
import pandas as pd
hyperparameters = pd.read_csv("data/neural_network_parameters.csv")
print(hyperparameters)

    Number of Layers     Layer Structure  Num Epochs  Learning Rate  Momentum  \
0                  3          [32, 5, 1]         100          0.001      0.85   
1                  4      [32, 12, 5, 1]         150          0.005      0.95   
2                  5  [32, 44, 12, 5, 1]         350          0.010      0.95   
3                  5  [32, 44, 12, 5, 1]         350          0.010      0.95   
4                  5  [32, 44, 12, 5, 1]         250          0.010      0.90   
5                  5  [32, 44, 12, 5, 1]         250          0.001      0.95   
6                  5  [32, 44, 12, 5, 1]         350          0.010      0.95   
7                  4      [32, 32, 5, 1]         350          0.010      0.95   
8                  4      [32, 44, 7, 1]         250          0.010      0.95   
9                  5  [32, 44, 12, 5, 1]         250          0.010      0.95   
10                 3          [32, 5, 1]         180          0.005      0.85   

   Activation Function  
0 

In [3]:
X_in, y_in = preprocessData.read_transformed_data_from_file()
print(X_in[:1])
print(y_in[:1])

Reading X and y data from file './data/transformed_train_matrix.csv' with target 'price__price'
[[0.30319149 0.63092975 0.21875    0.10521739 0.00652308 1.
  0.         0.         0.         0.         0.         0.
  1.         0.         0.         0.         0.         0.5
  0.4        0.57391304 0.65979557 0.2078922  0.99710744 0.90097191
  0.93686882 0.98647746 0.94869553 0.9678369  0.86959061 0.94195378
  0.78141002 0.98446017]]
[0.04393443]


For each iteration over the combinations: 
- we create a new instance of the NeuralNet with the hyperparameters,
- call the NeuralNet.fit() function with Y_in (instances) and y_in (ground truth target values) to train our neuronal network,
- call the NeuralNet.predict() function to obtain the estimated target values (y).

In [4]:
from NeuralNet import NeuralNet

neural_net_result_params = {}
for i, params in hyperparameters.iterrows():
    print(f"--- Combination {i} ---")
    neural_net = NeuralNet(
        L = params["Number of Layers"],
        n = eval(params["Layer Structure"]),  # Convert string to list
        n_epochs = params["Num Epochs"],
        learning_rate = params["Learning Rate"],
        momentum = params["Momentum"],
        activation_function = params["Activation Function"],
        validation_split = 0.2
    )

    neural_net.fit(X_in, y_in)
    y_pred = neural_net.predict(X_in)
    epoch_loss = neural_net.loss_epochs()

    neural_net_result_params[i] = {
        "Combination": i,
        "Hyperparameters": params.to_dict(),
        "Y_pred": y_pred,
        "Epoch_loss": epoch_loss
    }

--- Combination 0 ---
NeuralNet initialized with self.L = '3', self.n = '[32, 5, 1]', self.n_epochs = '100', self.learning_rate = '0.001', self.momentum = '0.85', self.fact = 'relu', self.validation_split = '0.2'
Executing fit(X, y)
Executing predict(X)
Executing loss_epochs()
--- Combination 1 ---
NeuralNet initialized with self.L = '4', self.n = '[32, 12, 5, 1]', self.n_epochs = '150', self.learning_rate = '0.005', self.momentum = '0.95', self.fact = 'tanh', self.validation_split = '0.2'
Executing fit(X, y)
Executing predict(X)
Executing loss_epochs()
--- Combination 2 ---
NeuralNet initialized with self.L = '5', self.n = '[32, 44, 12, 5, 1]', self.n_epochs = '350', self.learning_rate = '0.01', self.momentum = '0.95', self.fact = 'tanh', self.validation_split = '0.2'
Executing fit(X, y)
Executing predict(X)
Executing loss_epochs()
--- Combination 3 ---
NeuralNet initialized with self.L = '5', self.n = '[32, 44, 12, 5, 1]', self.n_epochs = '350', self.learning_rate = '0.01', self.mome

neural_net_result_params is a dictionary that contains the info about each combinations of hyperparameter and the result of their predictions. For example we will show the contain of the first element (0):

In [None]:
print(neural_net_result_params[0])

Now we can calculate MSE(Mean Squared Error), MAE (Mean Absolute Error) and MAPE (Mean Absolute Percentage Error), and compare the results.

After execution the predictions, if  NaN values are generated, we delete the result for that combination

In [None]:
import numpy as np

i = 0
while i < len(neural_net_result_params):
    if np.isnan(neural_net_result_params[i]["Y_pred"]).any():
        print(f"Handling NaN in Combination {neural_net_result_params[i]['Combination']}")
        del neural_net_result_params[i]
    i += 1
neural_net_result_params = neural_net_result_params.copy()

In [None]:
from sklearn.metrics import mean_squared_error, mean_absolute_error

for key, result in neural_net_result_params.items():
    y_pred = result["Y_pred"]
    
    mse = mean_squared_error(y_in, y_pred)
    mae = mean_absolute_error(y_in, y_pred)
    mape = sum(abs((y_true - y_pred_val) / y_true) for y_true, y_pred_val in zip(y_in, y_pred) if y_true != 0) / len(y_in)

    result["MSE"] = mse
    result["MAE"] = mae
    result["MAPE"] = mape
    

For each combination of hiperparameters we now have values of MSE, MAE and MAPE. 

In [None]:
print(neural_net_result_params[0]["MSE"])

We can compare the performance of the combinations of hyperparameter:

In [None]:
data = []
for key, result in neural_net_result_params.items():
    hyperparams = result["Hyperparameters"]
    data.append({
        "Combination": key,
        "Number of Layers": hyperparams["Number of Layers"],
        "Layer Structure": hyperparams["Layer Structure"],
        "Num Epochs": hyperparams["Num Epochs"],
        "Learning Rate": hyperparams["Learning Rate"],
        "Momentum": hyperparams["Momentum"],
        "Activation Function": hyperparams["Activation Function"],
        "MAPE": result["MAPE"],
        "MAE": result["MAE"],
        "MSE": result["MSE"]
    })

hyperparameters_performance_results = pd.DataFrame(data)
hyperparameters_performance_results = hyperparameters_performance_results.sort_values(
    by=["MAPE"],
    ascending=[True]
)
hyperparameters_performance_results.to_csv("./data/hyperparameters_performance_results.csv", index=False)
print("Data frame saved to 'hyperparameters_performance_results.csv' with the following columns:")
print(hyperparameters_performance_results)


These are the scatter plots of the Prediction Value vs Real Value of the combinations with lowest MAPE:

Sort results by MAE:

In [None]:
neural_net_result_params_first_items = sorted(neural_net_result_params.items(), key=lambda x: x[1]["MAE"])[:3]
y_pred_bp = neural_net_result_params_first_items[0]

In [None]:
import matplotlib.pyplot as plt
import math

# n = len(neural_net_result_params)
n = 3
fig, axes = plt.subplots(n, 1, figsize=(5, 3*n))
if n == 1:
    axes = [axes]

for i, (key, value) in enumerate(neural_net_result_params_first_items):
    Yi_pred = value["Y_pred"]
    Yi_pred = Yi_pred.reshape(-1)

    axes[i].scatter(y_in, Yi_pred)
    axes[i].set_title(f'Execution {value["Combination"]}')
    axes[i].set_xlabel('Real Y')
    axes[i].set_ylabel('Predicted Y')
plt.tight_layout()
plt.show()

And These are the scatter plots of the evolution of the training and validation error as a function
of the number of epochs of the combinations with lowest MAPE:

In [None]:
import matplotlib.pyplot as plt
import math

# n = len(neural_net_result_params)
n = 3
fig, axes = plt.subplots(n, 1, figsize=(7, 3*n))
if n == 1:
    axes = [axes]

for i, (key, value) in enumerate(neural_net_result_params_first_items):
    training_errors, validation_errors = value["Epoch_loss"]
    epochs = range(1, len(training_errors) + 1)
    
    axes[i].plot(epochs, training_errors, label='Training Error', marker='o')
    axes[i].plot(epochs, validation_errors, label='Validation Error', marker='s')
    axes[i].set_title(f'Execution {value["Combination"]}')
    axes[i].set_xlabel('Epoch')
    axes[i].set_ylabel('Error')
plt.tight_layout()
plt.show()

## Model result comparison

First, we are going to use the sklearn library to obtain new predictions from a linear regression model. We train the model:

In [None]:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X_train_comp, X_test_comp, y_train_comp, y_test_comp = train_test_split(X_in, y_in, test_size=0.2, random_state=31)
linear_regression_model = LinearRegression(fit_intercept=True, n_jobs=None)
linear_regression_model.fit(X_train_comp, y_train_comp)


And start the prediction process

In [None]:
y_pred_mlr = linear_regression_model.predict(X_in)
y_pred_mlr[:1]

The next step is to obtain new predictions from a multi layer neural network using the Keras library. For that we:
1. Define new hyperparameters.
2. Build multi-layer model
3. Train model
4. Predict a new y values

Note: There may be an expected warning message related to urllib3 v2 when the tensorflow library is imported.

In [None]:
import tensorflow as tf

keras_input_dim = X_train_comp.shape[1]
keras_hidden_units_1 = 12
keras_hidden_units_2 = 7
keras_activation = 'relu'
keras_output_activation = 'linear'
keras_optimizer = tf.keras.optimizers.Adam(learning_rate=0.001) 
keras_loss = 'mse'
keras_epochs = 100
keras_batch_size = 32 # should we size of X_train_comp.shape[0]?
print(f"Checking number of features for input layer: {keras_input_dim}")

In [None]:
from tensorflow.keras import layers, models

keras_model = models.Sequential([
    layers.Input(shape=(keras_input_dim,)),
    layers.Dense(keras_hidden_units_1, activation=keras_activation),
    layers.Dense(keras_hidden_units_2, activation=keras_activation),
    layers.Dense(1, activation=keras_output_activation)
])

keras_model.compile(optimizer = keras_optimizer, loss = keras_loss, metrics = ['mae'])
print(f"Checking multi-layer model is built: {keras_model}")

In [None]:
from sklearn.preprocessing import StandardScaler

keras_scaler = StandardScaler()
X_train_comp_scaled = keras_scaler.fit_transform(X_train_comp) #Should we trasnform X again?
keras_history = keras_model.fit(X_train_comp_scaled, y_train_comp,
                          epochs = keras_epochs,
                          batch_size = keras_batch_size,
                          validation_split = 0.2,
                          verbose = 1)


In [None]:
X_in_scaled = keras_scaler.fit_transform(X_in)
y_pred_keras = keras_model.predict(X_in_scaled).flatten()
y_pred_keras[:1]

Now we calculate MAE, MSE and MAPE for both y_pred_mlr and y_pred_keras.

In [None]:
def metrics(y_true, y_pred):
    mae = np.mean(np.abs(y_true - y_pred))
    mse = np.mean((y_true - y_pred)**2)
    mask = y_true != 0
    mape = np.mean(np.abs((y_true[mask] - y_pred[mask]) / y_true[mask])) * 100
    return mae, mse, mape

y_pred_bp_as_array = y_pred_bp[1]['Y_pred'] # Extract data from structure

mae_pred_bp, mse_pred_bp, mape_pred_bp = metrics(y_in, y_pred_bp_as_array)
mae_pred_mlr, mse_pred_mlr, mape_pred_mlr = metrics(y_in, y_pred_mlr)
mae_pred_keras, mse_pred_keras, mape_pred_keras = metrics(y_in, y_pred_keras)

model_comparison_results = pd.DataFrame({
    'Error measures': ['MAE', 'MSE', 'MAPE'],
    'BP': [mae_pred_bp, mse_pred_bp, mape_pred_bp],
    'MLR-F': [mae_pred_mlr, mse_pred_mlr, mape_pred_mlr],
    'BP-F': [mae_pred_keras, mse_pred_keras, mape_pred_keras]
})

print(model_comparison_results)

Now we show scatter plots of predicted vs real values for BP (manually), MLR-F, BP-F models.

In [None]:
plt.figure(figsize=(8, 8))
plt.scatter(y_in, y_pred_mlr, alpha=0.5, label='MLR-F')
plt.scatter(y_in, y_pred_keras, alpha=0.5, label='BP-F')
plt.scatter(y_in, y_pred_bp_as_array, alpha=0.5, label='BP (manually)')

# Plot a line for reference
min_val = min(y_in.min(), y_pred_bp_as_array.min(), y_pred_mlr.min(), y_pred_keras.min())
max_val = max(y_in.max(), y_pred_bp_as_array.max(), y_pred_mlr.max(), y_pred_keras.max())
plt.plot([min_val, max_val], [min_val, max_val], 'r--', label='1:1 line')

plt.xlabel('True Values')
plt.ylabel('Predicted Values')
plt.title('Predicted vs True Values for the three Models')
plt.legend()
plt.grid(True)
plt.show()

## Cross Validation

We used our manually created `NeuralNet`, `NeuralNetPredictor` and `PredictorExecutor` classes to execute **k-fold cross-validation** over an 1000 instances dataset to assess the performance of our different hyperparameter combinations defined in the _'neural_network_parameters.csv'_ files (also available at the cross-validation-results folder).

For **Negative MAE** these are the results:

| N. of Layers | Layer Structure    | Num Epochs | Learning Rate | Momentum | Activation Func. | Mean       | Variance    |
|--------------|--------------------|------------|---------------|----------|------------------|------------|-------------|
| 3            | [32, 5, 1]         | 100        | 0.001         | 0.85     | relu             | -0.0610646 | 2.25502e-05 |
| 4            | [32, 12, 5, 1]     | 150        | 0.005         | 0.95     | tanh             | -0.0271284 | 7.43315e-05 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | tanh             | -0.0206829 | 1.22648e-05 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | relu             | -0.0610646 | 2.25502e-05 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.9      | sigmoid          | -0.0264958 | 1.74208e-05 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.001         | 0.95     | sigmoid          | -0.0283447 | 1.62428e-05 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | sigmoid          | -0.0179797 | 6.10538e-06 |
| 4            | [32, 32, 5, 1]     | 350        | 0.01          | 0.95     | sigmoid          | -0.017877  | 7.71762e-06 |
| 4            | [32, 44, 7, 1]     | 250        | 0.01          | 0.95     | sigmoid          | -0.0171507 | 1.26287e-05 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.95     | sigmoid          | -0.0181939 | 9.37823e-06 |
| 3            | [32, 5, 1]         | 180        | 0.005         | 0.85     | relu             | -0.0626909 | 4.56893e-05 |

For **Negative MSE** these are the results:

| N. of Layers | Layer Structure    | Num Epochs | Learning Rate | Momentum | Activation Func. | Mean        | Variance    |
|--------------|--------------------|------------|---------------|----------|------------------|-------------|-------------|
| 3            | [32, 5, 1]         | 100        | 0.001         | 0.85     | relu             | -0.00268342 | 7.13703e-06 |
| 4            | [32, 12, 5, 1]     | 150        | 0.005         | 0.95     | tanh             | -0.00315976 | 6.13253e-06 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | tanh             | -0.0026338  | 1.29018e-05 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | relu             | -0.00652708 | 1.11375e-05 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.9      | sigmoid          | -0.00182747 | 5.5904e-06  |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.001         | 0.95     | sigmoid          | -0.00260613 | 8.42557e-06 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | sigmoid          | -0.00136719 | 2.26853e-06 |
| 4            | [32, 32, 5, 1]     | 350        | 0.01          | 0.95     | sigmoid          | -0.0012032  | 1.63084e-06 |
| 4            | [32, 44, 7, 1]     | 250        | 0.01          | 0.95     | sigmoid          | -0.00117807 | 1.89067e-06 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.95     | sigmoid          | -0.00143667 | 4.13801e-06 |
| 3            | [32, 5, 1]         | 180        | 0.005         | 0.85     | relu             | -0.00652708 | 1.11375e-05 |

For **Negative MAPE** these are the results:

| N. of Layers | Layer Structure    | Num Epochs | Learning Rate | Momentum | Activation Func. | Mean      | Variance   |
|--------------|--------------------|------------|---------------|----------|------------------|-----------|------------|
| 3            | [32, 5, 1]         | 100        | 0.001         | 0.85     | relu             | -1        | 0          |
| 4            | [32, 12, 5, 1]     | 150        | 0.005         | 0.95     | tanh             | -0.489261 | 0.00995061 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | tanh             | -0.560644 | 0.0687754  |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | relu             | -1        | 0          |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.9      | sigmoid          | -0.534834 | 0.00143487 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.001         | 0.95     | sigmoid          | -0.73224  | 0.00366142 |
| 5            | [32, 44, 12, 5, 1] | 350        | 0.01          | 0.95     | sigmoid          | -0.299791 | 0.00123542 |
| 4            | [32, 32, 5, 1]     | 350        | 0.01          | 0.95     | sigmoid          | -0.327576 | 0.00178844 |
| 4            | [32, 44, 7, 1]     | 250        | 0.01          | 0.95     | sigmoid          | -0.392357 | 0.00599126 |
| 5            | [32, 44, 12, 5, 1] | 250        | 0.01          | 0.95     | sigmoid          | -0.358972 | 0.00160058 |
| 3            | [32, 5, 1]         | 180        | 0.005         | 0.85     | relu             | -1        | 0          |


Below there is a quick demo, with a smaller subset of the data, on how to call `PredictorExecutor` to execute de cross validation. `neg_mean_absolute_error`, `neg_mean_squared_error` or `neg_mean_absolute_percentage_error` can be used according to the wanted scoring:

In [5]:
from PredictorExecutor import PredictorExecutor
from NeuralNetPredictor import NeuralNetPredictor
from tabulate import tabulate

hyperparameters = pd.read_csv("./data/neural_network_parameters.csv")

p = PreprocessData()
X_in, y_in = p.read_transformed_data_from_file()
X_in, y_in = X_in[:10], y_in[:10]

predictor_executor = PredictorExecutor()
scores_by_hyperparameters = predictor_executor.cross_validation(
    hyperparameters,
    X_in,
    y_in,
    scoring='neg_mean_absolute_error',
    folds=10
)

print(tabulate(scores_by_hyperparameters, headers='keys', tablefmt='simple'))

PreprocessData initialized.
Reading X and y data from file './data/transformed_train_matrix.csv' with target 'price__price'
NeuralNet initialized with self.L = '3', self.n = '[32, 5, 1]', self.n_epochs = '100', self.learning_rate = '0.001', self.momentum = '0.85', self.fact = 'relu', self.validation_split = '0'
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Executing fit(X, y)
Executing predict(X)
Saving Cross Validation ccore list for case '0' in './data/cross_validation_with_hyperparameters_0_neg_mean_absolute_error.csv'
NeuralNet initialized with self.L = '4', self.n = '[32, 12, 5, 1]', self.n_epochs = '150', self.learning_rate = '0.005', self.momentum = '0.95', self