In [1]:
# Importing the neccessary libraries
import pandas as pd
import numpy as np
import statistics as stats
import os
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn import preprocessing
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from datetime import datetime

# Forcing keras to use CPU.
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

In [2]:
# Reading the Data and storing it in a dataframe

df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
print('\nShape of dataframe : ',df.shape)


Shape of dataframe :  (1030, 9)


In [4]:
# Summary of the dataset
df.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
# Creating dataframes of features (X) and target (Y)
X = df.iloc[:, 0:8]
Y = df.iloc[:,8]

# Printing the dataframes X and Y to ensure we have created the dataframes with the correct columns
print('The features or the predictors (X) are : ', X, '\n\n') 
print('The target (Y) is : ', Y, '\n\n')

The features or the predictors (X) are :        Cement  Blast Furnace Slag  ...  Fine Aggregate  Age
0      540.0                 0.0  ...           676.0   28
1      540.0                 0.0  ...           676.0   28
2      332.5               142.5  ...           594.0  270
3      332.5               142.5  ...           594.0  365
4      198.6               132.4  ...           825.5  360
...      ...                 ...  ...             ...  ...
1025   276.4               116.0  ...           768.3   28
1026   322.2                 0.0  ...           813.4   28
1027   148.5               139.4  ...           780.0   28
1028   159.1               186.7  ...           788.9   28
1029   260.9               100.5  ...           761.5   28

[1030 rows x 8 columns] 


The target (Y) is :  0       79.99
1       61.89
2       40.27
3       41.05
4       44.30
        ...  
1025    44.28
1026    31.18
1027    23.70
1028    32.77
1029    32.40
Name: Strength, Length: 1030, dtype: float64 



<b>Note 1</b> : Unlike the method in this course, the splitting is done using indexing instead of using the names of the columns. Additionally, a different notation is used. The word <i>features</i> is used instead of <i>predictors</i>.


<b>Note 2</b> : Pandas indexes columns starting from 0. Note in the code below for the features (X) indexing is used as `[:, 0:8]`. The first part preceding the coma `(:)` tells pandas to include ALL rows of the original dataframe (df) in the new dataframe called X while the part succedding the comma `(0:8)` tells pandas to include all columns of the original dataframe (df) starting from column with index = 0 and ending with column with index = 7, <b> but not to include the column with index = 8 </b>  

<b>Note 3</b> : In order to split the data into train and test sets, the train_test_split function of the sklearn library is used. `The random_state` is used to ensure that the train and test split is the same each time, i.e. the train set and the test set have the same samples each time the code is run which is good for reproducing the results. If left empty, the random state is used by `np.random`. Since the Project requires splitting data the into <b>random</b> sets, hence `random_state` is not used, i.e. no value is set for random state. As the data hase to be split randomly  into train and test sets <b>50</b> times, a for loop will be used to to split the data in train test sets for <b>each model</b> 

In [6]:
def regression_model() :
    
    # Create the model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(X.shape[1],)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model

In [7]:
def data_split() :
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
    
    # Create a list containing X_train, X_test, Y_train, Y_test and return the list
    splits = [X_train, X_test, Y_train, Y_test] 
    return splits

<b>Note </b> : In the above function `data_split()` the <i>X_train, X_test, Y_train, Y_test</i> sets are stored in a list and list is returned. This is to ensure that the <i>X_train, X_test, Y_train, Y_test</i> sets are not printed when the function is called

In [8]:
def predict() :
    return model.predict(X_test)

def calculate_mse() :
    return mean_squared_error(Y_test,Y_predicted)

<b>Note 1</b> : The function `regression_model` as defined above only **compiles** the model and doesn't fit the model to training set. This is because in PART C, the number of epochs are changed and this would allow to fit the model with new number of epochs. This function will be used for **PART A**, **PART B** and **PART C**.

<b>Note 2</b> : Since the splitting data, prediction and mean squared error calculations are all being used repeatly as well, hence there are separate functions created for them to remove redundancy of typing the same line of code while also making the code more neat. However,  as we the features (X) are to be normalized only **once**, hence there is no need to create a function for it

<b>Note 3</b> : As the split function is executed first, the training and test sets are obtained and hence there is no need to explicitly pass any arguments to `predict()` and `calculate_mse()` functions. 

# <font color = blue> PART D : BASELINE MODEL WITH INCREASED HIDDEN LAYERS </font>


In this part, all the tasks from <b>PART B</b> are performed, but this time the number of hidden layers are increased to 3

<b>The new model will have : </b>
<ul>
        <li> Input layer with 10 nodes </li>
        <li> 3 hidden layers, each with 10 nodes and ReLU activation function </li>
        <li> Adam optimizer and mean squared error loss function </li>
</ul>


## <font color = darkorange>Task 1 : Train and Test the Baseline Model with 100 epochs and Increased Hidden Layers</font>

In order to train and test the the baseline model with normalized features, 100 epochs and increased hidden layers, the following steps are performed :
<ol>
    <li>Normalize the features (X)</li>
    <li>Randomly split the data into <i>X_train, X_test, Y_train, Y_test</i> sets</li>
    <li>Create a new model with 100 epochs</li>
    <li>Train the model using <i>X_train, Y_train</i> using <b>50</b> epochs</li>
    <li>Evaluate the model using <i>X_test, Y_test</i></li>
    <li>Get the predictions on the <i>X_test</i> set </li>
    <li>Compute the <b>mean squared error</b> on the test set using sklearn</li>
</ol>

### <font color = #2980B9> Step 1 : Normalize the features (X) </font>

In [18]:
X=(X-X.mean())/X.std()

### <font color = #2980B9> Step 2 : Randomly split the data into <i>X_train, X_test, Y_train, Y_test </font>

In [19]:
# Creating X_train, X_test, Y_train and Y_test sets
X_train, X_test, Y_train, Y_test = data_split()

### <font color = #2980B9> Step 3 : Create a new regression model with 3 hidden layers, each with 10 nodes and ReLU activation  </font>

In [20]:
def three_layer_regression_model () :
    
    # Create the model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(X.shape[1],)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))

    # Compile the model
    model.compile(optimizer='adam', loss='mean_squared_error')
    
    return model
    

### <font color = #2980B9> Step 4 : Train the model using <i>X_train, Y_train</i> using 50 epochs </font>

In [21]:
model = three_layer_regression_model()

# Fit the model on the train set
model.fit(X_train, Y_train, validation_split=0.3, epochs=100)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<tensorflow.python.keras.callbacks.History at 0x7f318c8114e0>

### <font color = #2980B9> Step 5 : Get the predictions on the X_test </font>

In [22]:
# Store the predictions in a variable Y_Predicted
Y_predicted = predict()

### <font color = #2980B9> Step 6 : Compute the <i>mean squared error</i> on the test set using sklearn </font>

In [23]:
# Calculate the mean square error

mse = calculate_mse()
print('Mean Square Error (MSE) of the Baseline Model with Normalized Features is : ' , str(mse))

Mean Square Error (MSE) of the Baseline Model with Normalized Features is :  109.7375954263577


## <font color = darkorange>Task 2 : Create 50 Models and Calculate the <i>µ</i> & <i>σ</i> of their MSE Errors with new Features</font>

In order to train 50 models with new features (normalized features) and calulate the mean (µ) and standard deviation (σ) of their mean square errors (MSE) the following steps are performed :
<ol>
    <li>Create an empty list <code>list_of_mse</code> to store the mean square error of each of the models</li>
    <li>Define a for loop and perform each of the following steps in the loop</li>
        <ol>
        <li>Randomly split the data into <i>X_train, X_test, Y_train, Y_test</i> sets</li>
        <li>Train the model using <i>X_train, Y_train</i> using <b>50</b> epochs</li>
        <li>Evaluate the model using <i>X_test, Y_test</i></li>
        <li>Get the predictions on the <i>X_test</i> set </li>
        <li>Compute the <b>mean squared error</b> on the test set using sklearn</li>
    </ol>
</ol>

### <font color = #2980B9> Step 1 : Creating the <i>list_of_mse</i> list</font>

In [24]:
# Create the empty lists
list_of_mse = []

### <font color = #2980B9> Step 2 : Creating 50 Models, Calculating The MSE for Each and <i>µ</i> & <i>σ</i> of the 50 MSE Values in <i>list_of_mse</i> </font>

In [25]:
# Create the for loop to split the data, create, compile & fit model, evaluate & nake predictions, caluclate mse and store
# in list_of_mse

start_time = datetime.now() # Starting time of the for loop execution

for i in range(50) :
    # Split the data into train and test set
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
    
    # Create and compile the regression model using the function regression_model as defined in TASK 1
    model = regression_model()

    # Fit the model on the train set
    print('\n\n\nTraining Model # ' , i+1 , '\n\n') # Print the Model Number that is being trained
    model.fit(X_train, Y_train, validation_split=0.3, epochs=100)
    print('\n')
    
    # Make prediction on the test set
    Y_predicted = model.predict(X_test)
    
    # Calculate the mean square error
    mse = mean_squared_error(Y_test, Y_predicted)
    
    # Add the mse to the list_of_mse list
    list_of_mse.append(mse)

end_time = datetime.now() # Ending time of the for loop execution

# Print time taken for fitting 50 models and calucating the Mean and Standard Deviation of MSE of 50 models
print('\n\nTotal Execution Time : ' , format(end_time - start_time))
    

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Epoch 100/100





Training Model #  27 


Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoc

In [26]:
# Calculate the Mean of the MSE of 50 models
mean_of_mse = stats.mean(list_of_mse)

# Calculate the Standard Deviation of the MSE of 50 models
std_of_mse = stats.stdev(list_of_mse)

# Print the Mean and Standard Deviation of MSE of 50 models
print('\n\nMean of the MSE of 50 Models : ' , str(mean_of_mse))
print('Standard Deviation of MSE of 50 Models : ' , str(std_of_mse))



Mean of the MSE of 50 Models :  133.6545873251317
Standard Deviation of MSE of 50 Models :  15.60317835661413


#### <font color = green> Comparision of Mean of MSE with Mean of MSE with PART C </font>
<table style="width:30%">
  <tr>
    <th>Mean of MSE of PART A</th>
    <th>Mean of MSE of PART B</th>
    <th>Mean of MSE of PART C</th>
    <th>Mean of MSE of PART D</th>
  </tr>
  <tr>
    <td>177.27</td>
    <td>176.27</td>
    <td>133.79</td>
    <td>133.65</td>
  </tr>
</table>

The table above compares the Mean of **MSE for PART A**, **Mean of MSE for PART B**, **Mean of MSE for PART C** and **Mean of MSE for PART D**. As can be seen, the value of Mean of MSE of PART D is marginally smaller than that of PART C and is the smallest value obtained. This shows that the effect of **normalizing the features** as well as **increasing the number of epochs by 2** yield the best results in terms of the performance of the regression model and helps it in finding the line of best fit

# <font color = ac36e3> END NOTE </font>

Although the results above table show that the best performance is achieved by normalizing the features, increasing the number of epochs **and** increasing the number of hidden layers, this might not be decisive. Repeating ***TASK 2*** for **PART A**, **PART B**, **PART C** and **PART D** several times shows different results. However, for the purposes of this project, those results are not included. 