# Assignment: Build a Regression Model in Keras 

# <a href="#parta">Part (A)</a>

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.



In [1]:
# @title Importing Library
import pandas as pd
import numpy as np

# Library for modal
import keras
from keras.models import Sequential
from keras.layers import Dense

#For data spliting
from sklearn.model_selection import train_test_split

#For mean sqare error
from sklearn.metrics import mean_squared_error

Using TensorFlow backend.
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [3]:
# @title Loading data set-

concrete_data=pd.read_csv('https://cocl.us/concrete_data')

In [4]:
concrete_data.sample(5)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
332,246.8,0.0,125.1,143.3,12.0,1086.8,800.9,56,60.32
522,284.0,15.0,141.0,179.0,5.5,842.0,801.0,56,44.52
455,213.5,0.0,174.2,159.2,11.7,1043.6,771.9,56,51.26
90,389.9,189.0,0.0,145.9,22.0,944.7,755.8,3,40.6
106,362.6,189.0,0.0,164.9,11.6,944.7,755.8,7,55.9


**The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:**

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

### Let's check how many data points we have.

In [5]:
concrete_data.shape

(1030, 9)

In [6]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [7]:
concrete_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


In [8]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

There are **1030** samples in the dataset.

**Strength is our target**

**The data looks very clean and is ready to be used to build our model.**

In [13]:
# Split data into predictors and target

predictors = concrete_data.iloc[:,:-1] # strength is the last column so this will exclude the last column.
target = concrete_data['Strength'] # Strength column

In [14]:
predictors.sample(5)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
943,151.6,0.0,111.9,184.4,7.9,992.0,815.9,28
913,298.0,0.0,107.0,164.0,13.0,953.0,784.0,28
754,540.0,0.0,0.0,173.0,0.0,1125.0,613.0,90
585,290.2,193.5,0.0,185.7,0.0,998.2,704.3,28
501,491.0,26.0,123.0,210.0,3.9,882.0,699.0,3


In [15]:
target.head(3)

0    79.99
1    61.89
2    40.27
Name: Strength, dtype: float64

In [20]:
# No. of features
n_cols=predictors.shape[1]
n_cols

8

<div id="parta"></div>

The below function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Function is using keras Sequantial that we have imported above

In [17]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,))) # hidden layers with node 10 and relu-- activation function
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [18]:
# Let's split the data in training and testing

X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)

### Train and Test the Network

**Let's call the function now to create our model.**

In [21]:
# build the model
model = regression_model()
epochs=50

# Fit
model.fit(X_train, y_train, epochs=epochs, verbose=1)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x7fa0d72fea90>

In [22]:
# Evaluate the model on the test data.

loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



226.45240996030543

### Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [23]:
# Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.
total_mean_squared_errors = 50
epochs = 50
mean_squared_errors = []
# Iterating 50 times
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors without normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 94.24556840816362
MSE 2: 77.85111258793803
MSE 3: 56.25663184502364
MSE 4: 57.70628184099414
MSE 5: 49.691781534731966
MSE 6: 54.102544445050185
MSE 7: 60.609713606849844
MSE 8: 49.239935013854385
MSE 9: 50.45509323564548
MSE 10: 51.46012295719875
MSE 11: 47.24440183917296
MSE 12: 45.375332637897976
MSE 13: 56.65442985238381
MSE 14: 54.427641414901586
MSE 15: 49.195795892511754
MSE 16: 44.414679641476724
MSE 17: 48.71733929038434
MSE 18: 50.2742357963883
MSE 19: 44.29165174120067
MSE 20: 47.87252113811407
MSE 21: 45.67616884299466
MSE 22: 46.99404118361982
MSE 23: 44.41588411053407
MSE 24: 46.28964927204218
MSE 25: 50.57130197568233
MSE 26: 48.59878789414094
MSE 27: 50.12507715811621
MSE 28: 47.99025476943328
MSE 29: 57.73277988865923
MSE 30: 49.520822605268854
MSE 31: 52.10597745037388
MSE 32: 43.04650686319592
MSE 33: 47.46149851974932
MSE 34: 49.26655405779101
MSE 35: 47.19001703354919
MSE 36: 52.59463478755025
MSE 37: 51.66575199114852
MSE 38: 53.052809261581274
MSE 39: 47.9