## Project 2B

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

Let's read the dataset into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [4]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [6]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick sanity check of the predictors and the target dataframes.

In [7]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.

In [9]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [10]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

<a id="item1"></a>

## Building Keras model

In [11]:
import keras

As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [12]:
from keras.models import Sequential
from keras.layers import Dense

In [13]:
# define the model itself
def regression_model():
    # build the model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compiling the model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Let's import scikit-learn in order to randomly split the data into a training and test sets

In [14]:
from sklearn.model_selection import train_test_split

Splitting the data into a training and test sets by holding 30% of the data for testing

In [15]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

## Train and Test

Let's call the function now to create our model.

In [16]:
# build the model
model = regression_model()

Next, we will train the model for 50 epochs.


In [17]:
# fit the model
epochs = 50
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/50
23/23 - 0s - loss: 1596.2004
Epoch 2/50
23/23 - 0s - loss: 1579.4698
Epoch 3/50
23/23 - 0s - loss: 1563.4880
Epoch 4/50
23/23 - 0s - loss: 1547.4736
Epoch 5/50
23/23 - 0s - loss: 1531.8109
Epoch 6/50
23/23 - 0s - loss: 1515.7946
Epoch 7/50
23/23 - 0s - loss: 1499.7998
Epoch 8/50
23/23 - 0s - loss: 1483.4865
Epoch 9/50
23/23 - 0s - loss: 1466.8881
Epoch 10/50
23/23 - 0s - loss: 1449.5197
Epoch 11/50
23/23 - 0s - loss: 1431.9116
Epoch 12/50
23/23 - 0s - loss: 1413.5826
Epoch 13/50
23/23 - 0s - loss: 1394.4092
Epoch 14/50
23/23 - 0s - loss: 1374.3584
Epoch 15/50
23/23 - 0s - loss: 1353.5099
Epoch 16/50
23/23 - 0s - loss: 1331.6273
Epoch 17/50
23/23 - 0s - loss: 1308.8914
Epoch 18/50
23/23 - 0s - loss: 1284.9707
Epoch 19/50
23/23 - 0s - loss: 1259.9913
Epoch 20/50
23/23 - 0s - loss: 1234.2889
Epoch 21/50
23/23 - 0s - loss: 1207.3663
Epoch 22/50
23/23 - 0s - loss: 1179.6564
Epoch 23/50
23/23 - 0s - loss: 1151.7195
Epoch 24/50
23/23 - 0s - loss: 1122.6000
Epoch 25/50
23/23 - 0s - 

<tensorflow.python.keras.callbacks.History at 0x1414f0fd0>

Next we need to evaluate the model on the test data.

In [18]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



362.25006103515625

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [19]:
from sklearn.metrics import mean_squared_error

In [20]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

362.2500340704751 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [21]:
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Following is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 153.38986206054688
MSE 2: 153.4390869140625
MSE 3: 96.38074493408203
MSE 4: 78.32361602783203
MSE 5: 65.34627532958984
MSE 6: 61.67755126953125
MSE 7: 57.96314239501953
MSE 8: 41.47554397583008
MSE 9: 45.30984878540039
MSE 10: 44.07246780395508
MSE 11: 41.68468475341797
MSE 12: 39.6212158203125
MSE 13: 46.25810241699219
MSE 14: 46.73246765136719
MSE 15: 41.2974739074707
MSE 16: 35.32357406616211
MSE 17: 38.54859161376953
MSE 18: 37.90545654296875
MSE 19: 37.958168029785156
MSE 20: 40.7080192565918
MSE 21: 33.85319137573242
MSE 22: 36.25786209106445
MSE 23: 32.86945343017578
MSE 24: 36.686737060546875
MSE 25: 37.282676696777344
MSE 26: 39.9595947265625
MSE 27: 34.33341979980469
MSE 28: 34.53462600708008
MSE 29: 40.26103591918945
MSE 30: 38.997657775878906
MSE 31: 37.00861740112305
MSE 32: 32.89824676513672
MSE 33: 32.99641799926758
MSE 34: 39.24507141113281
MSE 35: 38.321598052978516
MSE 36: 42.83156967163086
MSE 37: 34.13733673095703
MSE 38: 39.21443176269531
MSE 39: 34.36526107