## Build a baseline model

Download and Clean Dataset

In [74]:
import pandas as pd
import numpy as np

In [75]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Check the number data points

In [76]:
concrete_data.shape

(1030, 9)

Check for missing data

In [77]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [78]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Split data into predictors and target

In [79]:
concrete_data_columns = concrete_data.columns

predictors_original = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column


predictors=predictors_original.drop(['Age'], axis=1)

predictors.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5


In [80]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

normalize the data by substracting the mean and dividing by the standard deviation

In [81]:
n_cols = predictors.shape[1] # number of predictors
n_cols

7

Import Keras

In [82]:
import keras

In [83]:
from keras.models import Sequential
from keras.layers import Dense

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

In [84]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=1)

Build Neural Network

In [85]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Train and Test Network

In [86]:
# build the model
model = regression_model()

2. Train the model on the training data using 50 epochs.

In [93]:
model.fit(X_train, y_train, epochs=50, verbose = 2)

Epoch 1/50
 - 0s - loss: 153.8000
Epoch 2/50
 - 2s - loss: 146.0632
Epoch 3/50
 - 0s - loss: 145.8434
Epoch 4/50
 - 0s - loss: 147.6798
Epoch 5/50
 - 0s - loss: 144.8981
Epoch 6/50
 - 0s - loss: 140.5029
Epoch 7/50
 - 0s - loss: 144.3264
Epoch 8/50
 - 0s - loss: 141.6820
Epoch 9/50
 - 0s - loss: 141.9988
Epoch 10/50
 - 0s - loss: 141.8303
Epoch 11/50
 - 0s - loss: 145.6570
Epoch 12/50
 - 0s - loss: 142.6933
Epoch 13/50
 - 0s - loss: 143.1172
Epoch 14/50
 - 0s - loss: 141.1026
Epoch 15/50
 - 0s - loss: 149.7832
Epoch 16/50
 - 0s - loss: 140.7436
Epoch 17/50
 - 0s - loss: 140.2203
Epoch 18/50
 - 0s - loss: 143.1115
Epoch 19/50
 - 0s - loss: 143.0755
Epoch 20/50
 - 0s - loss: 143.8451
Epoch 21/50
 - 0s - loss: 146.3962
Epoch 22/50
 - 0s - loss: 147.6137
Epoch 23/50
 - 0s - loss: 150.8606
Epoch 24/50
 - 0s - loss: 146.7634
Epoch 25/50
 - 0s - loss: 148.4674
Epoch 26/50
 - 0s - loss: 151.0694
Epoch 27/50
 - 0s - loss: 142.6530
Epoch 28/50
 - 0s - loss: 142.0051
Epoch 29/50
 - 0s - loss: 146

<keras.callbacks.History at 0x7f970c0ebbe0>

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength a
nd the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [88]:
y_pred = model.predict(X_test)

In [90]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred)

187.40725022151216

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [91]:
MSE = []

for i in range(0, 50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=50, verbose=0)
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    print("Mean Squared Errors "+str(i+1)+": "+str(mean_square_error ))
    MSE.append(mean_square_error)

Mean Squared Errors 1: 147.21908908945682
Mean Squared Errors 2: 170.77864563630808
Mean Squared Errors 3: 165.83656691615437
Mean Squared Errors 4: 193.1965761551997
Mean Squared Errors 5: 206.1529628757807
Mean Squared Errors 6: 148.6784562318663
Mean Squared Errors 7: 177.38221390982457
Mean Squared Errors 8: 169.46838921630862
Mean Squared Errors 9: 166.58611486085871
Mean Squared Errors 10: 181.590903973799
Mean Squared Errors 11: 152.12546229670002
Mean Squared Errors 12: 139.82840763929917
Mean Squared Errors 13: 153.89712818284684
Mean Squared Errors 14: 180.1915665174726
Mean Squared Errors 15: 148.81992086906592
Mean Squared Errors 16: 133.1968741141545
Mean Squared Errors 17: 144.35536205638215
Mean Squared Errors 18: 128.31863949633725
Mean Squared Errors 19: 135.76382939449712
Mean Squared Errors 20: 162.66676621290122
Mean Squared Errors 21: 140.95777869681814
Mean Squared Errors 22: 146.55874716743668
Mean Squared Errors 23: 148.4929877714111
Mean Squared Errors 24: 147.

In [94]:
MSE = np.array(MSE)
meanMSE = np.mean(MSE)
stdevMSE = np.std(MSE)

print("mean of mean squared errors.: "+str(meanMSE))
print("standard deviation of mean squared errors.: "+str(stdevMSE))

mean of mean squared errors.: 153.2223806174302
standard deviation of mean squared errors.: 16.350439942905588


## B. Normalize the data

In [44]:
target_norm = (target-target.mean())/target.std()
target_norm.head()

0    2.644123
1    1.560663
2    0.266498
3    0.313188
4    0.507732
Name: Strength, dtype: float64

In [95]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569


1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

In [96]:
X_norm_train, X_norm_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=1)

2. Train the model on the training data using 50 epochs.

In [97]:
# build the model
model_norm = regression_model()

In [98]:
model_norm.fit(X_norm_train, y_train, epochs=50, verbose = 2)

Epoch 1/50
 - 1s - loss: 1523.8225
Epoch 2/50
 - 0s - loss: 1506.2397
Epoch 3/50
 - 0s - loss: 1488.5263
Epoch 4/50
 - 0s - loss: 1470.5080
Epoch 5/50
 - 0s - loss: 1452.0606
Epoch 6/50
 - 0s - loss: 1432.9811
Epoch 7/50
 - 0s - loss: 1413.4622
Epoch 8/50
 - 0s - loss: 1392.8337
Epoch 9/50
 - 0s - loss: 1371.4814
Epoch 10/50
 - 0s - loss: 1349.5119
Epoch 11/50
 - 0s - loss: 1326.4411
Epoch 12/50
 - 0s - loss: 1302.6084
Epoch 13/50
 - 0s - loss: 1277.2249
Epoch 14/50
 - 0s - loss: 1250.6801
Epoch 15/50
 - 0s - loss: 1223.2902
Epoch 16/50
 - 0s - loss: 1194.1211
Epoch 17/50
 - 0s - loss: 1164.7230
Epoch 18/50
 - 0s - loss: 1134.0171
Epoch 19/50
 - 0s - loss: 1102.5256
Epoch 20/50
 - 0s - loss: 1070.2239
Epoch 21/50
 - 0s - loss: 1037.4841
Epoch 22/50
 - 0s - loss: 1004.6032
Epoch 23/50
 - 0s - loss: 971.2520
Epoch 24/50
 - 0s - loss: 937.9457
Epoch 25/50
 - 0s - loss: 904.4083
Epoch 26/50
 - 0s - loss: 870.6850
Epoch 27/50
 - 0s - loss: 837.8361
Epoch 28/50
 - 0s - loss: 804.6731
Epoch 2

<keras.callbacks.History at 0x7f970c0452e8>

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [103]:
y_pred_norm = model_norm.predict(X_norm_test)

In [104]:
from sklearn.metrics import mean_squared_error
mean_squared_error(y_test, y_pred_norm)

315.4913759441846

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [114]:
MSE_norm = []

for i in range(0, 50):
    X_norm_train, X_norm_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model_norm.fit(X_norm_train, y_train, epochs=50, verbose=0)
    y_pred_norm = model_norm.predict(X_norm_test)
    mean_square_error = mean_squared_error(y_test, y_pred_norm)
    print("Mean Squared Errors for normalised predictors "+str(i+1)+": "+str(mean_square_error ))
    MSE_norm.append(mean_square_error)

Mean Squared Errors for normalised predictors 1: 113.33687475658795
Mean Squared Errors for normalised predictors 2: 123.05701891458105
Mean Squared Errors for normalised predictors 3: 135.77477703516297
Mean Squared Errors for normalised predictors 4: 143.08386684493743
Mean Squared Errors for normalised predictors 5: 137.92025329057415
Mean Squared Errors for normalised predictors 6: 105.27905131483374
Mean Squared Errors for normalised predictors 7: 139.30877442344706
Mean Squared Errors for normalised predictors 8: 116.01000865936578
Mean Squared Errors for normalised predictors 9: 131.96574309410929
Mean Squared Errors for normalised predictors 10: 130.8008610020763
Mean Squared Errors for normalised predictors 11: 123.025984460049
Mean Squared Errors for normalised predictors 12: 118.09301788195165
Mean Squared Errors for normalised predictors 13: 114.8556024874869
Mean Squared Errors for normalised predictors 14: 127.73030573255653
Mean Squared Errors for normalised predictors 1

In [115]:
MSE_norm = np.array(MSE_norm)
meanMSE_norm = np.mean(MSE_norm)
stdevMSE_norm = np.std(MSE_norm)

print("mean of mean squared errors with normalised predictors : "+str(meanMSE_norm))
print("standard deviation of mean squared errors with normalised predictors : "+str(stdevMSE_norm))

mean of mean squared errors with normalised predictors : 121.86839391659122
standard deviation of mean squared errors with normalised predictors : 9.273328333374813


### How does the mean of the mean squared errors compare to that from Step A?

The original data Mean of MSE is 153.22, Standard Deviation of MSE is 16.35.
with normalised predictors, the mean of MSE increased to 121.87 and standard deviation of MSE increased to 9.27.

With normalised predictors, the mean of the mean squared errors in Step B is lower than that in Step A.

## C. Increate the number of epochs to 100 for training, repeat Part B

2. Train the model on the training data using 100 epochs.

In [116]:
model_norm_100 = regression_model()

model_norm_100.fit(X_norm_train, y_train, epochs=100, verbose = 2)

Epoch 1/100
 - 1s - loss: 1585.6552
Epoch 2/100
 - 0s - loss: 1568.2863
Epoch 3/100
 - 0s - loss: 1550.6186
Epoch 4/100
 - 0s - loss: 1533.0341
Epoch 5/100
 - 0s - loss: 1515.2242
Epoch 6/100
 - 0s - loss: 1497.1624
Epoch 7/100
 - 0s - loss: 1479.1096
Epoch 8/100
 - 0s - loss: 1460.6509
Epoch 9/100
 - 0s - loss: 1441.4395
Epoch 10/100
 - 0s - loss: 1421.5909
Epoch 11/100
 - 0s - loss: 1401.2129
Epoch 12/100
 - 0s - loss: 1380.2374
Epoch 13/100
 - 0s - loss: 1358.6043
Epoch 14/100
 - 0s - loss: 1336.3910
Epoch 15/100
 - 0s - loss: 1313.5046
Epoch 16/100
 - 0s - loss: 1290.1337
Epoch 17/100
 - 0s - loss: 1266.0712
Epoch 18/100
 - 0s - loss: 1240.9391
Epoch 19/100
 - 0s - loss: 1215.4129
Epoch 20/100
 - 0s - loss: 1188.6407
Epoch 21/100
 - 0s - loss: 1161.5664
Epoch 22/100
 - 0s - loss: 1133.4698
Epoch 23/100
 - 0s - loss: 1105.4324
Epoch 24/100
 - 0s - loss: 1076.6273
Epoch 25/100
 - 0s - loss: 1047.5740
Epoch 26/100
 - 0s - loss: 1017.7968
Epoch 27/100
 - 0s - loss: 987.9217
Epoch 28/10

<keras.callbacks.History at 0x7f96cc176198>

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [117]:
y_pred_norm_100 = model_norm_100.predict(X_norm_test)

In [118]:
mean_squared_error(y_test, y_pred_norm_100)

196.3015919957497

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [119]:
MSE_norm_100 = []

for i in range(0, 50):
    X_norm_train, X_norm_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model_norm_100.fit(X_norm_train, y_train, epochs=100, verbose=0)
    y_pred_norm_100 = model_norm_100.predict(X_norm_test)
    mean_square_error = mean_squared_error(y_test, y_pred_norm_100)
    print("Mean Squared Errors for normalised predictors "+str(i+1)+": "+str(mean_square_error ))
    MSE_norm_100.append(mean_square_error)

Mean Squared Errors for normalised predictors 1: 146.51750500505528
Mean Squared Errors for normalised predictors 2: 148.1914934871058
Mean Squared Errors for normalised predictors 3: 148.04445390029744
Mean Squared Errors for normalised predictors 4: 161.89726210219746
Mean Squared Errors for normalised predictors 5: 153.1681926982695
Mean Squared Errors for normalised predictors 6: 121.29631585391425
Mean Squared Errors for normalised predictors 7: 156.73130728408174
Mean Squared Errors for normalised predictors 8: 130.47661608177734
Mean Squared Errors for normalised predictors 9: 145.4140247161987
Mean Squared Errors for normalised predictors 10: 143.23655237067945
Mean Squared Errors for normalised predictors 11: 131.6557069433074
Mean Squared Errors for normalised predictors 12: 126.12721730781391
Mean Squared Errors for normalised predictors 13: 128.07462309672565
Mean Squared Errors for normalised predictors 14: 142.88589396955803
Mean Squared Errors for normalised predictors 1

In [120]:
MSE_norm_100 = np.array(MSE_norm_100)
meanMSE_norm_100 = np.mean(MSE_norm_100)
stdevMSE_norm_100 = np.std(MSE_norm_100)

print("mean of mean squared errors with normalised predictors with epoch = 100 : "+str(meanMSE_norm_100))
print("standard deviation of mean squared errors with normalised predictors with epoch = 100 : "+str(stdevMSE_norm_100))

mean of mean squared errors with normalised predictors with epoch = 100 : 132.2296968606599
standard deviation of mean squared errors with normalised predictors with epoch = 100 : 11.364451589923386


How does the mean of the mean squared errors compare to that from Step B?

The mean of mean squared errors in Step C (with epoch = 100) is 132.23, larger than that from Step B at 121.87.