## Select and analysis dataset

First, we call PreprocessData.select_and_analyze_dataset() to prepare the input dataset and save the train and test data to files.

In [1]:
from PreprocessData import PreprocessData
preprocessData = PreprocessData()
preprocessData.select_and_analyze_dataset()

PreprocessData initialized.
Reading data from ./data/kc_house_data.csv...
Truncating data randomly to 2000 rows
Selecting this columns from the data: ['date', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'lat', 'long', 'price']
Removing missing values from columns: ['date', 'bedrooms', 'bathrooms', 'sqft_living', 'sqft_lot', 'floors', 'waterfront', 'view', 'condition', 'grade', 'yr_built', 'lat', 'long', 'price']
Removing outliers values from columns: []
Splitting data: train_data (1600) and test_data (400)
Creating ColumnTransformer
Fit train_data
Transforming train_data and test_data
Train matrix saved to ./data/transformed_train_matrix.csv
Test matrix saved to ./data/transformed_test_matrix.csv
Transformer saved to ./data/transformer.pkl
Executed all subtasks of select and analyze dataset...


## Hyperparameter comparison and selection

We will explore some of the space of hyperparameters, trying different combinations and 
evaluating the quality of the result of the prediction obtained using them.

For that, we load the hyperparameter combinations and the transformed train dataset from files.

In [2]:
import pandas as pd
hyperparameters = pd.read_csv("data/neural_network_parameters.csv")
print(hyperparameters)

    Number of Layers        Layer Structure  Num Epochs  Learning Rate  \
0                  3            [22, 64, 1]         100         0.0010   
1                  4       [22, 128, 64, 1]         150         0.0005   
2                  2                [22, 1]          80         0.0020   
3                  4       [22, 128, 64, 1]         120         0.0010   
4                  5  [22, 256, 128, 64, 1]         200         0.0001   
5                  4       [22, 128, 64, 1]         150         0.0005   
6                  3            [22, 64, 1]         100         0.0010   
7                  5  [22, 256, 128, 64, 1]         200         0.0002   
8                  3           [22, 128, 1]         140         0.0005   
9                  5  [22, 256, 128, 64, 1]         250         0.0001   
10                 3            [22, 64, 1]         180         0.0005   

    Momentum Activation Function  
0       0.90                relu  
1       0.95                tanh  
2     

In [3]:
X_in, y_in = preprocessData.read_transformed_data_from_file()
print(X_in[:1])
print(y_in[:1])

[[0.30319149 0.375      0.21875    0.10521739 0.00652308 1.
  0.         0.         0.         0.         0.         0.
  1.         0.         0.         0.         0.         0.5
  0.4        0.57391304 0.65979557 0.2078922 ]]
[0.04393443]


For each iteration over the combinations: 
- we create a new instance of the NeuralNet with the hyperparameters,
- call the NeuralNet.fit() function with Y_in (instances) and y_in (ground truth target values) to train our neuronal network,
- call the NeuralNet.predict() function to obtain the estimated target values (y).

In [None]:
from NeuralNet import NeuralNet
for _, params in hyperparameters.iterrows():
    neural_net = NeuralNet(
        L = params["Number of Layers"],
        n = eval(params["Layer Structure"]),  # Convert string to list
        n_epochs = params["Num Epochs"],
        learning_rate = params["Learning Rate"],
        momentum = params["Momentum"],
        activation_function = params["Activation Function"],
        validation_split = 0.2
    )

    neural_net.fit(X_in, y_in)

NeuralNet initialized with self.L = '3', self.n = '[22, 64, 1]', self.n_epochs = '100', self.learning_rate = '0.001', self.momentum = '0.9', self.fact = 'relu', self.validation_split = '0.2'
Executing fit(X, y)
NeuralNet initialized with self.L = '4', self.n = '[22, 128, 64, 1]', self.n_epochs = '150', self.learning_rate = '0.0005', self.momentum = '0.95', self.fact = 'tanh', self.validation_split = '0.2'
Executing fit(X, y)
NeuralNet initialized with self.L = '2', self.n = '[22, 1]', self.n_epochs = '80', self.learning_rate = '0.002', self.momentum = '0.85', self.fact = 'sigmoid', self.validation_split = '0.2'
Executing fit(X, y)
NeuralNet initialized with self.L = '4', self.n = '[22, 128, 64, 1]', self.n_epochs = '120', self.learning_rate = '0.001', self.momentum = '0.9', self.fact = 'relu', self.validation_split = '0.2'
Executing fit(X, y)
NeuralNet initialized with self.L = '5', self.n = '[22, 256, 128, 64, 1]', self.n_epochs = '200', self.learning_rate = '0.0001', self.momentum = 