## Building a Regression Model using Keras

### Table of contents
1. Donwnload and clean dataset
2. Import Keras
3. Build a Neural Network
4. Train and Test the Neural Network

### Download and Clean Dataset

We will start by importing the necessary libraries for the project

In [1]:
import time
import os
import pandas as pd
import numpy as np


from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split


In [2]:
conda install -c conda-forge keras

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.


Note: you may need to restart the kernel to use updated packages.


The dataset that we will use is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used. The ingredients include:
1. Cement
2. Blast furnace slag
3. Fly ash
4. Water
5. Superplasticizer
6. Coarse aggregate
7. Fine aggregate

We will download the data and read it into a pandas daataframe

In [3]:
concrete_df = pd.read_csv("/Users/juan/Downloads/concrete_data.csv")
concrete_df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The first sample consists of 540 cubic meters of cement, 0 cubic meters of blast furnace slag and fly ash, 162 cubic meters of water, 2.5 cubic meters of superplasticizer, 1040 cubic meters of fine aggregate, 676 cubic meters of fine aggregate and has a compressive strength of 79.99 MPa at 28 days.

Check how many data points we have in the data frame

In [4]:
concrete_df.shape

(1030, 9)

The next step is to check the dataset for any missing values

In [5]:
concrete_df.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [6]:
concrete_df.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The dataset looks clean so we can proceed to build our model

#### Split data intro predictors and target

In [7]:
concrete_df_columns = concrete_df.columns
predictors = concrete_df[concrete_df_columns[concrete_df_columns != "Strength"]] # All columns except Strength
target = concrete_df["Strength"] #Only the column Strength

Check the first rows for the predictors and target data

In [8]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Let's save the number of predictors to n_cols variables, because we will need it when we build our network

In [10]:
n_cols = predictors.shape[1] #number of predictors

### Importing packages from Keras library

Let's import the packages from the Keras library that we will need to build our regression model

In [11]:
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


### Part A: Build a baseline model

Use the Keras library to build a neural network with the following:

One hidden layer of 10 nodes, and a ReLU activation function

Use the adam optimizer and the mean squared error as the loss function.

Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

Train the model on the training data using 50 epochs.

Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

Report the mean and the standard deviation of the mean squared errors.

#### Build the neural network model

Let's create a function that defines our regression model and we can call it to create our model

In [12]:
#define the regression model with one hidden layer
def regression_model():
    # Create model
    model = Sequential()

    model.add(Dense(10, activation="relu", input_shape=(n_cols,)))
    model.add(Dense(1))

    # Compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

####  Train and test the model

We will create the regression model

In [13]:
model = regression_model()

Instructions for updating:
Colocations handled automatically by placer.


Randomly split the data into a training and test sets by holding 30%  of the data for testing. You can use the train_test_split helper function 

In [14]:
X = predictors
y = target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=24)   
print("Training set: ", X_train.shape, y_train.shape)
print("Testing set: ", X_test.shape, y_test.shape)

Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)


Train the model on the training data using 50 epochs

In [15]:
model.fit(X_train, y_train, epochs=50, verbose=2)

Instructions for updating:
Use tf.cast instead.
Epoch 1/50
 - 0s - loss: 82046.0770
Epoch 2/50
 - 0s - loss: 18025.4646
Epoch 3/50
 - 0s - loss: 8759.1094
Epoch 4/50
 - 0s - loss: 8335.7559
Epoch 5/50
 - 0s - loss: 7773.3083
Epoch 6/50
 - 0s - loss: 7329.9335
Epoch 7/50
 - 0s - loss: 6910.7450
Epoch 8/50
 - 0s - loss: 6485.1436
Epoch 9/50
 - 0s - loss: 6092.6331
Epoch 10/50
 - 0s - loss: 5714.7185
Epoch 11/50
 - 0s - loss: 5366.2461
Epoch 12/50
 - 0s - loss: 5020.3584
Epoch 13/50
 - 0s - loss: 4671.2692
Epoch 14/50
 - 0s - loss: 4303.6966
Epoch 15/50
 - 0s - loss: 3919.3289
Epoch 16/50
 - 0s - loss: 3559.3547
Epoch 17/50
 - 0s - loss: 3217.6885
Epoch 18/50
 - 0s - loss: 2916.7368
Epoch 19/50
 - 0s - loss: 2656.1221
Epoch 20/50
 - 0s - loss: 2446.7698
Epoch 21/50
 - 0s - loss: 2265.7666
Epoch 22/50
 - 0s - loss: 2112.0865
Epoch 23/50
 - 0s - loss: 1967.0465
Epoch 24/50
 - 0s - loss: 1835.1763
Epoch 25/50
 - 0s - loss: 1736.8659
Epoch 26/50
 - 0s - loss: 1653.7515
Epoch 27/50
 - 0s - los

<keras.callbacks.callbacks.History at 0x1a31e71cc0>

Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.    
    

In [16]:
y_hat = model.predict(X_test)   
mse = mean_squared_error(y_test, y_hat)

In [17]:
print(mse)

425.5978468613843


Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors

We will create one function that performs steps 1 to 3 and another function that iterates 50 times and creates the list of 50 mean squared errors

In [18]:
def get_mean_squared_error(compiled_model, X, y, epochs=50, verbose=1):
    """Get report (dataframe) of two metrics: 
    The mean and the standard deviation of the mean squared errors
    """   
    
    # 1. Randomly split the data into a training and test sets by holding 30% 
    # of the data for testing. You can use the train_test_split helper function 
    # from Scikit-learn. 
    X = predictors
    y = target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=24)   
    print("Training set: ", X_train.shape, y_train.shape)
    print("Testing set: ", X_test.shape, y_test.shape)
    
    
    # 2. Train the model on the training data using 50 epochs.
    # Fit the built model with training set
    model.fit(X_train, y_train, epochs=epochs, verbose=verbose)    

    # 3. Evaluate the model on the test data and compute the mean squared error 
    # between the predicted concrete strength and the actual concrete strength. 
    # You can use the mean_squared_error function from Scikit-learn.    
    y_hat = model.predict(X_test)    
    mse = mean_squared_error(y_test, y_hat)
    
    # Return the mean squared error
    return mse

In [19]:
#Function to round the calculation of the mean and std deviation to 2 decimal places
def get_round(score, num_of_digits=2):
    """Get round with given number of decimal digits 
    """
    return round(score, num_of_digits)

#Function to calculate the mean of the list of mean squared errors
def get_mean(list_of_mse_scores):
    """Get mean
    """
    if list_of_mse_scores:
        return get_round(np.mean(list_of_mse_scores))
    return None

#Function to calculate the standard deviation of the list of mean squared errors
def get_standard_deviation(list_of_mse_scores):
    """Get standard deviation
    """
    if list_of_mse_scores:
        return get_round(np.std(list_of_mse_scores))
    return None




#Function to iterate and calculate the mean squared error
def get_mean_and_std_of_mse(df_X, 
                            df_y, 
                            compiled_model,                
                            max_iteration=50, 
                            epochs=50, 
                            verbose=0):
    """Generate the mean and the standard deviation of the mean squared errors 
    """
    # Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.    
    list_of_mean_squared_errors = []
    for i in range(max_iteration):
        start_time = time.time()
        print("-" * 36)
        print("Processing current number of iteration : {}".format(i+1))        
        mse = get_mean_squared_error(compiled_model, df_X, df_y, epochs=epochs, verbose=verbose)
        list_of_mean_squared_errors.append(mse)
        print("Duration (seconds): {}".format(time.time()-start_time))
    # end for

    print("Finished - {} times.\nAnd the list of mean squared errors : {}".format(max_iteration,
                                                                                  list_of_mean_squared_errors))
    
    mean_mse = get_mean(list_of_mean_squared_errors)
    std_mse = get_standard_deviation(list_of_mean_squared_errors)

    print("-" * 72)
    print("The mean and the standard deviation of the mean squared errors are: {} and {}, respectively".format(
           mean_mse, std_mse))
    
    return mean_mse, std_mse




In [20]:
max_iteration = 50
epochs = 50
verbose = 2

# Get the compiled model
model = regression_model()

mean_mse, std_mse = get_mean_and_std_of_mse(predictors, target, model, max_iteration=max_iteration, epochs=epochs, verbose=verbose)

------------------------------------
Processing current number of iteration : 1
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Epoch 1/50
 - 0s - loss: 42431.4934
Epoch 2/50
 - 0s - loss: 25715.4288
Epoch 3/50
 - 0s - loss: 16453.5699
Epoch 4/50
 - 0s - loss: 11494.4647
Epoch 5/50
 - 0s - loss: 8518.6648
Epoch 6/50
 - 0s - loss: 6386.2868
Epoch 7/50
 - 0s - loss: 3597.7111
Epoch 8/50
 - 0s - loss: 1373.7471
Epoch 9/50
 - 0s - loss: 1059.9671
Epoch 10/50
 - 0s - loss: 974.2333
Epoch 11/50
 - 0s - loss: 911.4578
Epoch 12/50
 - 0s - loss: 856.4017
Epoch 13/50
 - 0s - loss: 807.4115
Epoch 14/50
 - 0s - loss: 763.6117
Epoch 15/50
 - 0s - loss: 725.3675
Epoch 16/50
 - 0s - loss: 691.2619
Epoch 17/50
 - 0s - loss: 658.5349
Epoch 18/50
 - 0s - loss: 630.1255
Epoch 19/50
 - 0s - loss: 602.0749
Epoch 20/50
 - 0s - loss: 577.2500
Epoch 21/50
 - 0s - loss: 552.6402
Epoch 22/50
 - 0s - loss: 530.8192
Epoch 23/50
 - 0s - loss: 509.8607
Epoch 24/50
 - 0s - loss: 489.2211
Epoch 25/50
 - 

### Report the mean and standard deviation of the mean squared error

The mean and standard deviation of the mean squared error after 50 iterations, for the case of not normalized data is:

In [21]:
def get_report(name_of_case, mean_mse, std_mse):
    """Get report of mse and std: 
    The mean and the standard deviation of the mean squared errors
    """
    COL_NAME_EXPERIMENT = "Experiment"
    COL_NAME_MSE = "Mean MSE"
    COL_NAME_RMSE = "Std Deviation MSE"
    header_of_mse_and_rmse = [COL_NAME_EXPERIMENT, COL_NAME_MSE, COL_NAME_RMSE]
    values = [[name_of_case, mean_mse, std_mse]]

    return pd.DataFrame(columns=header_of_mse_and_rmse, data=values)

In [22]:
name_of_case = "Baseline-not normalized (50 epochs)"

# Report the mean and the standard deviation of the mean squared errors
df_baseline = get_report(name_of_case, mean_mse, std_mse)
df_baseline

Unnamed: 0,Experiment,Mean MSE,Std Deviation MSE
0,Baseline-not normalized (50 epochs),68.42,28.34


## Part B: Normalized data

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.
How does the mean of the mean squared errors compare to that from Step A?

### Data before normalization

In [23]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


### Data after normalization

We will normalize the data by substracting the mean and dividing by the standard deviation

In [24]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [25]:
n_colsnorm = predictors_norm.shape[1] #number of predictors

Defining the regression model as above

In [26]:
#define the regression model with one hidden layer
def regression_model():
    # Create model
    model = Sequential()

    model.add(Dense(10, activation="relu", input_shape=(n_colsnorm,)))
    model.add(Dense(1))

    # Compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Creating the regression model

In [27]:
model = regression_model()

Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function

In [28]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=24)   
print("Training set: ", X_train.shape, y_train.shape)
print("Testing set: ", X_test.shape, y_test.shape)

Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)


Train the model on the training data using 50 epochs

In [29]:
model.fit(X_train, y_train, epochs=50, verbose=2)

Epoch 1/50
 - 0s - loss: 1580.2936
Epoch 2/50
 - 0s - loss: 1565.4569
Epoch 3/50
 - 0s - loss: 1550.2420
Epoch 4/50
 - 0s - loss: 1534.3472
Epoch 5/50
 - 0s - loss: 1517.6791
Epoch 6/50
 - 0s - loss: 1500.3521
Epoch 7/50
 - 0s - loss: 1482.0560
Epoch 8/50
 - 0s - loss: 1462.7950
Epoch 9/50
 - 0s - loss: 1442.5782
Epoch 10/50
 - 0s - loss: 1420.9095
Epoch 11/50
 - 0s - loss: 1398.4737
Epoch 12/50
 - 0s - loss: 1374.6691
Epoch 13/50
 - 0s - loss: 1349.8286
Epoch 14/50
 - 0s - loss: 1324.1022
Epoch 15/50
 - 0s - loss: 1297.1598
Epoch 16/50
 - 0s - loss: 1269.1467
Epoch 17/50
 - 0s - loss: 1240.6282
Epoch 18/50
 - 0s - loss: 1210.9761
Epoch 19/50
 - 0s - loss: 1180.7707
Epoch 20/50
 - 0s - loss: 1149.7748
Epoch 21/50
 - 0s - loss: 1118.2775
Epoch 22/50
 - 0s - loss: 1086.0821
Epoch 23/50
 - 0s - loss: 1053.1280
Epoch 24/50
 - 0s - loss: 1019.8849
Epoch 25/50
 - 0s - loss: 986.3940
Epoch 26/50
 - 0s - loss: 952.3473
Epoch 27/50
 - 0s - loss: 917.8836
Epoch 28/50
 - 0s - loss: 884.1236
Epoch

<keras.callbacks.callbacks.History at 0x1a32641eb8>

Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [30]:
y_hat = model.predict(X_test)   
mse = mean_squared_error(y_test, y_hat)

In [31]:
print(mse)

281.52303027541734


For the case of not normalized data that we ran in Part A mse = 425.59, for this case with normalized data mse=281.52

Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors

In [32]:
def get_mean_squared_error(compiled_model, X, y, epochs=50, verbose=1):
    """Get report (dataframe) of two metrics: 
    The mean and the standard deviation of the mean squared errors
    """   
    
    # 1. Randomly split the data into a training and test sets by holding 30% 
    # of the data for testing. You can use the train_test_split helper function 
    # from Scikit-learn. 
    X = predictors_norm
    y = target
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=24)   
    print("Training set: ", X_train.shape, y_train.shape)
    print("Testing set: ", X_test.shape, y_test.shape)
    
    
    # 2. Train the model on the training data using 50 epochs.
    # Fit the built model with training set
    model.fit(X_train, y_train, epochs=epochs, verbose=verbose)    

    # 3. Evaluate the model on the test data and compute the mean squared error 
    # between the predicted concrete strength and the actual concrete strength. 
    # You can use the mean_squared_error function from Scikit-learn.    
    y_hat = model.predict(X_test)    
    mse = mean_squared_error(y_test, y_hat)
    
    # Return the mean squared error
    return mse

In [33]:
#Function to round the calculation of the mean and std deviation to 2 decimal places
def get_round(score, num_of_digits=2):
    """Get round with given number of decimal digits 
    """
    return round(score, num_of_digits)

#Function to calculate the mean of the list of mean squared errors
def get_mean(list_of_mse_scores):
    """Get mean
    """
    if list_of_mse_scores:
        return get_round(np.mean(list_of_mse_scores))
    return None

#Function to calculate the standard deviation of the list of mean squared errors
def get_standard_deviation(list_of_mse_scores):
    """Get standard deviation
    """
    if list_of_mse_scores:
        return get_round(np.std(list_of_mse_scores))
    return None




#Function to iterate and calculate the mean squared error
def get_mean_and_std_of_mse(df_X, 
                            df_y, 
                            compiled_model,                
                            max_iteration=50, 
                            epochs=50, 
                            verbose=0):
    """Generate the mean and the standard deviation of the mean squared errors 
    """
    # Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.    
    list_of_mean_squared_errors = []
    for i in range(max_iteration):
        start_time = time.time()
        print("-" * 36)
        print("Processing current number of iteration : {}".format(i+1))        
        mse = get_mean_squared_error(compiled_model, df_X, df_y, epochs=epochs, verbose=verbose)
        list_of_mean_squared_errors.append(mse)
        print("Duration (seconds): {}".format(time.time()-start_time))
    # end for

    print("Finished - {} times.\nAnd the list of mean squared errors : {}".format(max_iteration,
                                                                                  list_of_mean_squared_errors))
    
    mean_mse = get_mean(list_of_mean_squared_errors)
    std_mse = get_standard_deviation(list_of_mean_squared_errors)

    print("-" * 72)
    print("The mean and the standard deviation of the mean squared errors are: {} and {}, respectively".format(
           mean_mse, std_mse))
    
    return mean_mse, std_mse




In [34]:
max_iteration = 50
epochs = 50
verbose = 2

# Get the compiled model
model = regression_model()

mean_mse, std_mse = get_mean_and_std_of_mse(predictors_norm, target, model, max_iteration=max_iteration, epochs=epochs, verbose=verbose)

------------------------------------
Processing current number of iteration : 1
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Epoch 1/50
 - 0s - loss: 1548.6876
Epoch 2/50
 - 0s - loss: 1529.2276
Epoch 3/50
 - 0s - loss: 1509.7447
Epoch 4/50
 - 0s - loss: 1489.1789
Epoch 5/50
 - 0s - loss: 1468.6996
Epoch 6/50
 - 0s - loss: 1447.3221
Epoch 7/50
 - 0s - loss: 1424.8728
Epoch 8/50
 - 0s - loss: 1401.4830
Epoch 9/50
 - 0s - loss: 1377.4726
Epoch 10/50
 - 0s - loss: 1352.0345
Epoch 11/50
 - 0s - loss: 1325.3782
Epoch 12/50
 - 0s - loss: 1297.8572
Epoch 13/50
 - 0s - loss: 1269.2500
Epoch 14/50
 - 0s - loss: 1239.3789
Epoch 15/50
 - 0s - loss: 1208.5932
Epoch 16/50
 - 0s - loss: 1177.1423
Epoch 17/50
 - 0s - loss: 1144.4714
Epoch 18/50
 - 0s - loss: 1111.5249
Epoch 19/50
 - 0s - loss: 1078.0212
Epoch 20/50
 - 0s - loss: 1043.2538
Epoch 21/50
 - 0s - loss: 1008.7613
Epoch 22/50
 - 0s - loss: 973.9042
Epoch 23/50
 - 0s - loss: 938.6763
Epoch 24/50
 - 0s - loss: 903.7288
Epoch 2

###  Report the mean and standard deviation of the mean squared error

The mean and standard deviation of the mean squared error after 50 iterations, for the case of not normalized data is:

In [35]:
def get_report(name_of_case, mean_mse, std_mse):
    """Get report of mse and std: 
    The mean and the standard deviation of the mean squared errors
    """
    COL_NAME_EXPERIMENT = "Experiment"
    COL_NAME_MSE = "Mean MSE"
    COL_NAME_RMSE = "Std Deviation MSE"
    header_of_mse_and_rmse = [COL_NAME_EXPERIMENT, COL_NAME_MSE, COL_NAME_RMSE]
    values = [[name_of_case, mean_mse, std_mse]]

    return pd.DataFrame(columns=header_of_mse_and_rmse, data=values)

In [36]:
name_of_case = "Baseline  normalized (50 epochs)"

# Report the mean and the standard deviation of the mean squared errors
df_baseline_norm = get_report(name_of_case, mean_mse, std_mse)
df_baseline_norm

Unnamed: 0,Experiment,Mean MSE,Std Deviation MSE
0,Baseline normalized (50 epochs),49.86,35.23


Comparing the case of not normalized data with the case of normalized data:

In [37]:
# Create a data frame with the summary
df_summary = pd.concat([df_baseline, df_baseline_norm], axis=0)

# Review the result dataframe
df_summary.reset_index(drop=True)


Unnamed: 0,Experiment,Mean MSE,Std Deviation MSE
0,Baseline-not normalized (50 epochs),68.42,28.34
1,Baseline normalized (50 epochs),49.86,35.23


### Part C: Normalized data with 100 epochs

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?

#### Building the model with normalized data and increasing to 100 epochs

In [38]:
max_iteration = 50
epochs = 100
verbose = 0

# Get the compiled model
model = regression_model()

mean_mse, std_mse = get_mean_and_std_of_mse(predictors_norm, target, model, max_iteration=max_iteration, epochs=epochs, verbose=verbose)

------------------------------------
Processing current number of iteration : 1
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Duration (seconds): 4.3207032680511475
------------------------------------
Processing current number of iteration : 2
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Duration (seconds): 3.371795892715454
------------------------------------
Processing current number of iteration : 3
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Duration (seconds): 2.5873563289642334
------------------------------------
Processing current number of iteration : 4
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Duration (seconds): 4.480020046234131
------------------------------------
Processing current number of iteration : 5
Training set:  (721, 8) (721,)
Testing set:  (309, 8) (309,)
Duration (seconds): 3.949136257171631
------------------------------------
Processing current number of iteration : 6
Training set:  (721, 8)

### Report the mean and standard deviation of the mean squared error

The mean and standard deviation of the mean squared error after 50 iterations, for the case of normalized data running 100 epochs is:

In [39]:
def get_report(name_of_case, mean_mse, std_mse):
    """Get report of mse and std: 
    The mean and the standard deviation of the mean squared errors
    """
    COL_NAME_EXPERIMENT = "Experiment"
    COL_NAME_MSE = "Mean MSE"
    COL_NAME_RMSE = "Std Deviation MSE"
    header_of_mse_and_rmse = [COL_NAME_EXPERIMENT, COL_NAME_MSE, COL_NAME_RMSE]
    values = [[name_of_case, mean_mse, std_mse]]

    return pd.DataFrame(columns=header_of_mse_and_rmse, data=values)

In [40]:
name_of_case = "Baseline  normalized (100 epochs)"

# Report the mean and the standard deviation of the mean squared errors
df_baseline_norm100 = get_report(name_of_case, mean_mse, std_mse)
df_baseline_norm100

Unnamed: 0,Experiment,Mean MSE,Std Deviation MSE
0,Baseline normalized (100 epochs),42.58,16.57


Comparing the normalized cases ran with 50 epochs and 100 epochs 

In [41]:
# Create a data frame with the summary
df_summary = pd.concat([df_baseline, df_baseline_norm, df_baseline_norm100], axis=0)

# Review the result dataframe
df_summary.reset_index(drop=True)



Unnamed: 0,Experiment,Mean MSE,Std Deviation MSE
0,Baseline-not normalized (50 epochs),68.42,28.34
1,Baseline normalized (50 epochs),49.86,35.23
2,Baseline normalized (100 epochs),42.58,16.57


We can see from the result, that both the Mean of the  MSE (mean squared error) and the Std Deviation  of the MSE reduced when the number of epochs was increased from 50 to 100