## Project 2C

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

Let's read the dataset into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [6]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick sanity check of the predictors and the target dataframes.

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.

In [9]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [10]:
n_cols = predictors_norm.shape[1] # number of predictors

## Building Keras model

In [11]:
import keras

As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [12]:
from keras.models import Sequential
from keras.layers import Dense

In [13]:
# define the model itself
def regression_model():
    # creating the model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compiling model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Let's import scikit-learn in order to randomly split the data into a training and test sets

In [14]:
from sklearn.model_selection import train_test_split

Splitting the data into a training and test sets by holding 30% of the data for testing

In [15]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

## Train and Test 

Let's call the function now to create our model.

In [16]:
# build the model
model = regression_model()

Next, we will train the model for 50 epochs.


In [17]:
# fit the model
epochs = 100
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/100
23/23 - 0s - loss: 1639.9762
Epoch 2/100
23/23 - 0s - loss: 1622.0172
Epoch 3/100
23/23 - 0s - loss: 1604.0128
Epoch 4/100
23/23 - 0s - loss: 1586.4027
Epoch 5/100
23/23 - 0s - loss: 1568.1458
Epoch 6/100
23/23 - 0s - loss: 1550.0372
Epoch 7/100
23/23 - 0s - loss: 1530.8644
Epoch 8/100
23/23 - 0s - loss: 1511.4619
Epoch 9/100
23/23 - 0s - loss: 1491.4005
Epoch 10/100
23/23 - 0s - loss: 1470.5549
Epoch 11/100
23/23 - 0s - loss: 1449.0493
Epoch 12/100
23/23 - 0s - loss: 1427.0072
Epoch 13/100
23/23 - 0s - loss: 1404.1825
Epoch 14/100
23/23 - 0s - loss: 1380.7894
Epoch 15/100
23/23 - 0s - loss: 1356.4185
Epoch 16/100
23/23 - 0s - loss: 1331.5461
Epoch 17/100
23/23 - 0s - loss: 1305.6908
Epoch 18/100
23/23 - 0s - loss: 1279.4860
Epoch 19/100
23/23 - 0s - loss: 1252.4337
Epoch 20/100
23/23 - 0s - loss: 1225.2687
Epoch 21/100
23/23 - 0s - loss: 1197.2546
Epoch 22/100
23/23 - 0s - loss: 1168.2578
Epoch 23/100
23/23 - 0s - loss: 1139.4092
Epoch 24/100
23/23 - 0s - loss: 1109.3973
E

<tensorflow.python.keras.callbacks.History at 0x140937f50>

Next we need to evaluate the model on the test data.

In [18]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



163.85171508789062

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [19]:
from sklearn.metrics import mean_squared_error

In [20]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

163.85169936599985 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [21]:
total_mean_squared_errors = 50
epochs = 100
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 92.90280151367188
MSE 2: 72.89657592773438
MSE 3: 45.128334045410156
MSE 4: 41.41788864135742
MSE 5: 40.89964294433594
MSE 6: 43.26124954223633
MSE 7: 43.950382232666016
MSE 8: 33.9612922668457
MSE 9: 37.5832633972168
MSE 10: 36.8336296081543
MSE 11: 36.9637336730957
MSE 12: 32.54510498046875
MSE 13: 41.16047286987305
MSE 14: 40.655967712402344
MSE 15: 35.097721099853516
MSE 16: 30.85770606994629
MSE 17: 33.12465286254883
MSE 18: 33.22416687011719
MSE 19: 32.20165252685547
MSE 20: 35.859432220458984
MSE 21: 31.51734733581543
MSE 22: 32.2318229675293
MSE 23: 27.29950714111328
MSE 24: 32.86106872558594
MSE 25: 33.01205825805664
MSE 26: 34.990657806396484
MSE 27: 29.890884399414062
MSE 28: 30.436325073242188
MSE 29: 35.37593078613281
MSE 30: 35.04522705078125
MSE 31: 31.48307228088379
MSE 32: 30.4407958984375
MSE 33: 31.57286834716797
MSE 34: 32.47333908081055
MSE 35: 34.725189208984375
MSE 36: 40.25605773925781
MSE 37: 27.05064582824707
MSE 38: 34.96195602416992
MSE 39: 30.3407726