# Intro to DL&NN Final Project (A)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow
import tensorflow.keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.utils import to_categorical
%matplotlib inline

## Reading and Exploring Data

In [2]:
conc_df=pd.read_csv('concrete_data.csv')
conc_df.head(4)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05


In [3]:
conc_df.shape

(1030, 9)

In [4]:
conc_df.isna().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

In [5]:
conc_df.describe().transpose()

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Cement,1030.0,281.167864,104.506364,102.0,192.375,272.9,350.0,540.0
Blast Furnace Slag,1030.0,73.895825,86.279342,0.0,0.0,22.0,142.95,359.4
Fly Ash,1030.0,54.18835,63.997004,0.0,0.0,0.0,118.3,200.1
Water,1030.0,181.567282,21.354219,121.8,164.9,185.0,192.0,247.0
Superplasticizer,1030.0,6.20466,5.973841,0.0,0.0,6.4,10.2,32.2
Coarse Aggregate,1030.0,972.918932,77.753954,801.0,932.0,968.0,1029.4,1145.0
Fine Aggregate,1030.0,773.580485,80.17598,594.0,730.95,779.5,824.0,992.6
Age,1030.0,45.662136,63.169912,1.0,7.0,28.0,56.0,365.0
Strength,1030.0,35.817961,16.705742,2.33,23.71,34.445,46.135,82.6


In [6]:
conc_df.columns

Index(['Cement', 'Blast Furnace Slag', 'Fly Ash', 'Water', 'Superplasticizer',
       'Coarse Aggregate', 'Fine Aggregate', 'Age', 'Strength'],
      dtype='object')

In [7]:
conc_df_columns=conc_df.columns
predictors=conc_df[conc_df_columns[conc_df_columns!='Strength']]
target=conc_df['Strength']

In [8]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [10]:
n_cols=predictors.shape[1]
n_cols

8

## Modelling

In [11]:
def regression_model():
    model=Sequential()
    model.add(Dense(10,activation='relu',input_shape=(n_cols,)))
    model.add(Dense(1))
    model.compile(optimizer='adam',loss='mean_squared_error')
    
    return model

In [12]:
model=regression_model()

## Train Test Split

In [27]:
from sklearn.model_selection import train_test_split
X_train,X_test,y_train,y_test=train_test_split(predictors,target,test_size=0.3,random_state=80)
print(f'Train Set Shapes - X : {X_train.shape} y : {y_train.shape}')
print(f'Test Set Shapes - X : {X_test.shape} y : {y_test.shape}')

Train Set Shapes - X : (721, 8) y : (721,)
Test Set Shapes - X : (309, 8) y : (309,)


In [28]:
epochs=50
model.fit(X_train,y_train,epochs=epochs,verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<tensorflow.python.keras.callbacks.History at 0x1b0f5fb3280>

In [29]:
loss=model.evaluate(X_test,y_test)
y_pred=model.predict(X_test)
loss



103.99345397949219

In [30]:
from sklearn.metrics import mean_squared_error

mse=mean_squared_error(y_test,y_pred)
mean=np.mean(mse)
std_dev=np.std(mse)
print(mean,std_dev)

103.99343380476337 0.0


## 50 MSE

In [35]:
total_mse=50
epochs=50
mse=[]

for i in range (0,total_mse):
    X_train,X_test,y_train,y_test=train_test_split(predictors,target,test_size=0.3,random_state=i)
    model.fit(X_train,y_train,epochs=epochs,verbose=0)
    MSE=model.evaluate(X_test,y_test,verbose=0)
    print(f'MSE {str(i+1)} : {str(MSE)}')
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mse.append(mean_square_error)

MSE 1 : 94.48351287841797
MSE 2 : 126.07400512695312
MSE 3 : 109.57637786865234
MSE 4 : 122.18971252441406
MSE 5 : 122.3802490234375
MSE 6 : 108.4609603881836
MSE 7 : 135.21310424804688
MSE 8 : 106.96836853027344
MSE 9 : 136.3834991455078
MSE 10 : 119.80699157714844
MSE 11 : 104.04444885253906
MSE 12 : 100.41139221191406
MSE 13 : 115.05136108398438
MSE 14 : 115.62496185302734
MSE 15 : 108.8838119506836
MSE 16 : 108.71155548095703
MSE 17 : 104.7904281616211
MSE 18 : 97.59019470214844
MSE 19 : 94.40612030029297
MSE 20 : 114.09965515136719
MSE 21 : 100.75159454345703
MSE 22 : 100.71129608154297
MSE 23 : 47.595951080322266
MSE 24 : 49.2398567199707
MSE 25 : 52.15057373046875
MSE 26 : 54.96613311767578
MSE 27 : 50.24462890625
MSE 28 : 44.966529846191406
MSE 29 : 53.97052001953125
MSE 30 : 49.656105041503906
MSE 31 : 53.41554641723633
MSE 32 : 42.70719909667969
MSE 33 : 55.58005142211914
MSE 34 : 49.82334518432617
MSE 35 : 55.228023529052734
MSE 36 : 53.84105682373047
MSE 37 : 52.75980758666

In [36]:
mean_squared_errors = np.array(mse)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

In [41]:
print(f"The mean and standard deviation of {str(total_mse)} MSE without normalized data. Total number of epochs for each training is: {str(epochs)}\n")
print(f"Mean: {str(mean)}")
print(f"Standard Deviation: {str(standard_deviation)}")

The mean and standard deviation of 50 MSE without normalized data. Total number of epochs for each training is: 50

Mean: 78.3080944950562
Standard Deviation: 30.363794584425648
