## Build a Regression Model in Keras part A

A. Build a baseline model (5 marks)

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

#### Importing  libraries

In [2]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


#### Downloading the data

In [3]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Checking the data

In [4]:
concrete_data.shape

(1030, 9)

In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [6]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

#### Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn

In [7]:
from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(concrete_data,test_size=0.30)
print('train {0}'.format(train_set.shape))
print('test {0}'.format(test_set.shape))

train (721, 9)
test (309, 9)


#### Our target variable is strength so all the rest are predictors

In [8]:
concrete_data_columns = train_set.columns
#training test 
predictors_train_set = train_set[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target_train_set = train_set['Strength'] # Strength column

#test set
predictors_test_set = test_set[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target_test_set = test_set['Strength'] # Strength column

In [9]:
num_cols = predictors_train_set.shape[1] # number of predictors
print("We have {0} features".format(num_cols ))

We have 8 features


In [10]:
target_train_set.head()

127    55.50
339    21.91
583    37.81
125    56.40
808    11.47
Name: Strength, dtype: float64

## 1. One hidden layer of 10 nodes, and a ReLU activation function

In [11]:
from keras.models import Sequential
from keras.layers import Dense
# define regression model
def regression_model_one_layer():
    # create model with one hidden layer
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(num_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

## 2. Train the model on the training data using 50 epochs.
## 3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.
## 4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

In [12]:

import statistics 
from sklearn.metrics import mean_squared_error

# build the model
model = regression_model_one_layer()
mse_results=[]
for x in range(50):
    # fit the model
    model.fit(predictors_train_set, target_train_set, epochs=50, verbose=0)
    train_results = model.predict( predictors_test_set )
    mse_results.append( mean_squared_error(target_test_set, train_results) )

Instructions for updating:
Colocations handled automatically by placer.
Instructions for updating:
Use tf.cast instead.


In [15]:
print("Mean of the list of mean square errors is {}".format(statistics.mean( mse_results ) ))
print("Standard deviation of the list of mean square errors is {}".format ( statistics.stdev( mse_results )))

Mean of the list of mean square errors is 50.52458408459279
Standard deviation of the list of mean square errors is 10.466389051646118
