# Regression Model in Keras

#### In this project, we build a regression model using the deep learning Keras library, and then experiments with increasing the number of training epochs and changing number of hidden layers and will see how changing these parameters impacts the performance of the model.

# A. Build a baseline model

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error  as the loss function.


1. Randomly split the data into a training and test sets by holding 30% of the data for testing.Using the train_test_splithelper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.







Import the pandas, the numpy, the keras libraries and the packages from the keras library:

In [1]:
import pandas as pd
import numpy as np
import keras
#import sklearn
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split

#### Getting Data 

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head(10)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3
5,266.0,114.0,0.0,228.0,0.0,932.0,670.0,90,47.03
6,380.0,95.0,0.0,228.0,0.0,932.0,594.0,365,43.7
7,380.0,95.0,0.0,228.0,0.0,932.0,594.0,28,36.45
8,266.0,114.0,0.0,228.0,0.0,932.0,670.0,28,45.85
9,475.0,0.0,0.0,228.0,0.0,932.0,594.0,28,39.29


In [3]:
#Checking how many  data points we have
concrete_data.shape


(1030, 9)

We have approx 1000 samples to train our model 




Checking for any missing values

In [4]:
concrete_data.describe()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()


Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Now data looks clean and is ready to build our model 

#### Splitting Data into  predictors and target

In [6]:
concrete_data_columns = concrete_data.columns


In [7]:
# Including all columns except Strength
predictors = concrete_data.iloc[:, :-1]
predictors.head(10)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360
5,266.0,114.0,0.0,228.0,0.0,932.0,670.0,90
6,380.0,95.0,0.0,228.0,0.0,932.0,594.0,365
7,380.0,95.0,0.0,228.0,0.0,932.0,594.0,28
8,266.0,114.0,0.0,228.0,0.0,932.0,670.0,28
9,475.0,0.0,0.0,228.0,0.0,932.0,594.0,28


In [8]:
#Only Strength Column
target = concrete_data['Strength']
target.head(10)

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
5    47.03
6    43.70
7    36.45
8    45.85
9    39.29
Name: Strength, dtype: float64

Let's save the number of predictors to n_cols since we will need this number when building our network.

In [9]:
#number of predictors 
n_cols = predictors.shape[1]

Create a function that defines our regression model for us so that we can conveniently call it to create our model:

- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error as the loss function.

#### Build a Neural Network

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [10]:
# define regression model

def regression_model():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

Build the model:

In [11]:
model = regression_model()


### Train and Test the Network

Train and test the model at the same time using the fit-method. We will leave out 20% of the data for validation and we will train the model for 50 epochs.

In [12]:
list_of_mean_squared_error = []
for cycle in range(50):
    #split the data into a training set (70%) and a test set (30%):  
    X_train, X_test, y_train, y_test = train_test_split(predictors , target, test_size = 0.3)
    #Train and test the model at the same time
    res = model.fit(X_train, y_train, epochs=50, verbose=0, validation_data = ( X_test , y_test))
    #Finding mean_squared_error as last value in history.
    mean_squared_error = res.history['val_loss'][-1]
    #Adding value of mean_squared_error for every cycle in list.
    list_of_mean_squared_error.append(mean_squared_error)
    print('Cycle value #{}: mean_squared_error {}'.format(cycle+1, mean_squared_error))

Cycle value #1: mean_squared_error 226.87924194335938
Cycle value #2: mean_squared_error 150.17823791503906
Cycle value #3: mean_squared_error 131.70521545410156
Cycle value #4: mean_squared_error 125.64649963378906
Cycle value #5: mean_squared_error 119.2017593383789
Cycle value #6: mean_squared_error 112.25725555419922
Cycle value #7: mean_squared_error 96.59072875976562
Cycle value #8: mean_squared_error 88.8746337890625
Cycle value #9: mean_squared_error 76.23237609863281
Cycle value #10: mean_squared_error 81.1646957397461
Cycle value #11: mean_squared_error 70.36358642578125
Cycle value #12: mean_squared_error 59.214073181152344
Cycle value #13: mean_squared_error 55.94407272338867
Cycle value #14: mean_squared_error 59.54146957397461
Cycle value #15: mean_squared_error 57.24049377441406
Cycle value #16: mean_squared_error 55.23329544067383
Cycle value #17: mean_squared_error 61.243629455566406
Cycle value #18: mean_squared_error 55.661773681640625
Cycle value #19: mean_squared_e

Find the mean and the standard deviation of the mean squared errors:



In [13]:
print('The mean of the mean squared errors: {}'.format(np.mean(list_of_mean_squared_error)))
print('The standard deviation of the mean squared errors: {}'.format(np.std(list_of_mean_squared_error)))

The mean of the mean squared errors: 65.38046295166015
The standard deviation of the mean squared errors: 33.8784517631442


## B. Normalize the data 

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

Normalize the data by substracting the mean and dividing by the standard deviation:

In [14]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head(10)

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069
5,-0.145138,0.464818,-0.846733,2.174405,-1.038638,-0.526262,-1.291914,0.701883
6,0.945704,0.244603,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
7,0.945704,0.244603,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,-0.279597
8,-0.145138,0.464818,-0.846733,2.174405,-1.038638,-0.526262,-1.291914,-0.279597
9,1.85474,-0.856472,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,-0.279597


Build the model:

In [15]:
n_cols = predictors_norm.shape[1]
def regression_model2():
    model_2 = Sequential()
    model_2.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model_2.add(Dense(1))
    
    model_2.compile(optimizer='adam', loss='mean_squared_error')
    return model_2

model_2 = regression_model2()

Train and test the model at the same time using the fit-method. We will leave out 30% of the data for validation and we will train the model for 50 epochs. And use predictors_norm instead of predictors.

In [16]:
list_of_mean_squared_error = []
for cycle in range(50):
    #split the data into a training set (70%) and a test set (30%):  
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.2)
    #Train and test the model at the same time
    res = model_2.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))
    #Finding mean_squared_error as last value in history.
    mean_squared_error = res.history['val_loss'][-1]
    #Adding value of mean_squared_error for every cycle in list.
    list_of_mean_squared_error.append(mean_squared_error)
    print('Cycle value #{}: mean_squared_error {}'.format(cycle+1, mean_squared_error))

Cycle value #1: mean_squared_error 299.9188537597656
Cycle value #2: mean_squared_error 151.49282836914062
Cycle value #3: mean_squared_error 86.79348754882812
Cycle value #4: mean_squared_error 87.87969207763672
Cycle value #5: mean_squared_error 73.90184783935547
Cycle value #6: mean_squared_error 73.05838775634766
Cycle value #7: mean_squared_error 51.09978103637695
Cycle value #8: mean_squared_error 45.27106857299805
Cycle value #9: mean_squared_error 46.32917404174805
Cycle value #10: mean_squared_error 45.99818420410156
Cycle value #11: mean_squared_error 34.86263656616211
Cycle value #12: mean_squared_error 40.047828674316406
Cycle value #13: mean_squared_error 37.015106201171875
Cycle value #14: mean_squared_error 36.02146911621094
Cycle value #15: mean_squared_error 41.69187545776367
Cycle value #16: mean_squared_error 35.21765899658203
Cycle value #17: mean_squared_error 37.92621994018555
Cycle value #18: mean_squared_error 27.262624740600586
Cycle value #19: mean_squared_err

Printing the mean and the standard deviation of the mean squared errors:

In [17]:
print('The mean of the mean squared errors: {}'.format(np.mean(list_of_mean_squared_error)))
print('The standard deviation of the mean squared errors: {}'.format(np.std(list_of_mean_squared_error)))

The mean of the mean squared errors: 45.67293006896973
The standard deviation of the mean squared errors: 41.925601263682424


The mean squared errors in case B is less than in case A but the  Standard deviation of mean squared error in B is greater than A . And in my opinion it's not a very good idea to compare result of two poor neural networks with one hidden layer only. Data normalization does not help a lot. Error is huge for both cases: A and B.



## C. Increate the number of epochs

Repeat Part B but use 100 epochs this time for training.


Train and test the model at the same time using the fit-method. We will leave out 30% of the data (data after normalization) for validation and we will train the model for 100 epochs instead of 50 epochs.

Build the model:

In [18]:
def regression_model3():
    model_3 = Sequential()
    model_3.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model_3.add(Dense(1))
    
    model_3.compile(optimizer='adam', loss='mean_squared_error')
    return model_3

model_3 = regression_model3() 

In [19]:
list_of_mean_squared_error = []
for cycle in range(50):
    # split the data into a training set (70%) and a test set (30%):  
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3)
    #Train and test the model at the same time
    res = model_3.fit(X_train, y_train, epochs=100, verbose=0, validation_data=(X_test, y_test))
    #Finding mean_squared_error as last value in history.
    mean_squared_error = res.history['val_loss'][-1]
    #Adding value of mean_squared_error for every cycle in list.
    list_of_mean_squared_error.append(mean_squared_error)
    print('Cycle value #{}: mean_squared_error {}'.format(cycle+1, mean_squared_error))

Cycle value #1: mean_squared_error 170.4151611328125
Cycle value #2: mean_squared_error 91.01439666748047
Cycle value #3: mean_squared_error 64.60135650634766
Cycle value #4: mean_squared_error 45.29921340942383
Cycle value #5: mean_squared_error 42.689517974853516
Cycle value #6: mean_squared_error 44.542354583740234
Cycle value #7: mean_squared_error 33.08058547973633
Cycle value #8: mean_squared_error 36.49795913696289
Cycle value #9: mean_squared_error 36.78158950805664
Cycle value #10: mean_squared_error 36.93260955810547
Cycle value #11: mean_squared_error 37.18305587768555
Cycle value #12: mean_squared_error 38.11288833618164
Cycle value #13: mean_squared_error 34.1373176574707
Cycle value #14: mean_squared_error 40.264488220214844
Cycle value #15: mean_squared_error 35.47918701171875
Cycle value #16: mean_squared_error 38.60531234741211
Cycle value #17: mean_squared_error 38.30622100830078
Cycle value #18: mean_squared_error 36.383358001708984
Cycle value #19: mean_squared_erro

In [20]:
print('The mean of the mean squared errors: {}'.format(np.mean(list_of_mean_squared_error)))
print('The standard deviation of the mean squared errors: {}'.format(np.std(list_of_mean_squared_error)))

The mean of the mean squared errors: 39.622714080810546
The standard deviation of the mean squared errors: 21.034297887295477


The mean and the standard deviation of the mean squared errors in case B is bigger than in case C. But in both cases error is huge. In my opinion it's not a very good idea to compare result of two poor neural networks with one hidden layer only. Number of epoch does not help.

## D. Increase the number of hidden layers

Repeat part B but use a neural network with the following instead:

Three hidden layers, each of 10 nodes and ReLU activation function.
How does the mean of the mean squared errors compare to that from Step B?

Create a new model with three hidden layers, each of 10 nodes and ReLU activation function.

In [21]:
def regression_model4():
    model_4 = Sequential()
    model_4.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model_4.add(Dense(10, activation='relu'))
    model_4.add(Dense(10, activation='relu'))
    model_4.add(Dense(1))
    
    model_4.compile(optimizer='adam', loss='mean_squared_error')
    return model_4

Build a new model with 3 hidden layers:

In [22]:
model_4 = regression_model4()

Train and test the model at the same time using the fit-method. We will leave out 30% of the data (data after normalization) for validation and we will train the model for 50 epochs and use three hidden layers, each of 10 nodes and ReLU activation function.

In [23]:
list_of_mean_squared_error = []
for cycle in range(50):
    #split the data into a training set (70%) and a test set (30%):  
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3)
    #Train and test the model at the same time
    res = model_4.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))
    #Finding mean_squared_error as last value in history.
    mean_squared_error = res.history['val_loss'][-1]
    #Adding value of mean_squared_error for every cycle in list.
    list_of_mean_squared_error.append(mean_squared_error)
    print('Cycle value #{}: mean_squared_error {}'.format(cycle+1, mean_squared_error))

Cycle value #1: mean_squared_error 141.2133331298828
Cycle value #2: mean_squared_error 89.59857177734375
Cycle value #3: mean_squared_error 63.706050872802734
Cycle value #4: mean_squared_error 54.567657470703125
Cycle value #5: mean_squared_error 40.03699493408203
Cycle value #6: mean_squared_error 42.378448486328125
Cycle value #7: mean_squared_error 36.78065490722656
Cycle value #8: mean_squared_error 38.79425048828125
Cycle value #9: mean_squared_error 36.01682662963867
Cycle value #10: mean_squared_error 36.18518829345703
Cycle value #11: mean_squared_error 34.868412017822266
Cycle value #12: mean_squared_error 35.183067321777344
Cycle value #13: mean_squared_error 33.62141799926758
Cycle value #14: mean_squared_error 34.172393798828125
Cycle value #15: mean_squared_error 28.52121353149414
Cycle value #16: mean_squared_error 29.79411506652832
Cycle value #17: mean_squared_error 31.706815719604492
Cycle value #18: mean_squared_error 31.73746681213379
Cycle value #19: mean_squared_

Printing the mean and the standard deviation of the mean squared errors:

In [24]:
print('The mean of the mean squared errors: {}'.format(np.mean(list_of_mean_squared_error)))
print('The standard deviation of the mean squared errors: {}'.format(np.std(list_of_mean_squared_error)))

The mean of the mean squared errors: 33.59087348937988
The standard deviation of the mean squared errors: 19.09187054852058


The mean and the standard deviation of the mean squared errors in case D is less than in case A, B and C. And it's the only case where error is not very big. It means additional layers in neural network are more important than other things. Also it proves the comparison between poor neural network with one hidden layer in previous cases is a bad idea. Result can be unpredictable.