<h1 align=center><font size = 5>F. Manja Submission: Regression Models with Keras</font></h1>

## Introduction

In this notebook, we will use the Keras library to build regression models.

## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [1]:
import pandas as pd
import numpy as np

We will be playing around with the same dataset that we used in the videos.

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's download the data and read it into a <em>pandas</em> dataframe.

In [2]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Let's see shape of data

In [3]:
concrete_data.shape

(1030, 9)

Let's check the dataset for any missing values.

In [4]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [5]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

<a id="item2"></a>

Let's do a quick sanity check of the predictors and the target dataframes.

In [6]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [7]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the predictor data by substracting the mean and dividing by the standard deviation.

In [8]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Let's save the number of predictors to *n_cols* since we will need this number when building our network.

In [9]:
n_cols = predictors_norm.shape[1] # number of predictors

In [10]:
import sklearn.model_selection as model_selection

X_train, X_test, y_train, y_test = model_selection.train_test_split(predictors, target, train_size=0.70,test_size=0.3)


<a id="item1"></a>

<a id='item32'></a>

## Import Keras

We used TensorFlow as the backend to install Keras, so it should clearly print that when we import Keras.

#### Let's go ahead and import the Keras library

In [11]:
import keras

Using TensorFlow backend.


As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [12]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>

## Part A. Build a Neural (Network Baseline)

Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.

In [13]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function create a model that has two hidden layers, each of 50 hidden units.

<a id="item4"></a>

<a id='item34'></a>

## Train and Test the Network

Let's call the function now to create our model.

In [14]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 30% of the data for validation and we will train the model for 100 epochs.

In [15]:
mean_squared_errors = []
for iter in range(50):
    # fit the model
    model.fit(X_train, y_train, validation_split=0.3, epochs=50, verbose=0)
    # evaluate the model
    scores = model.evaluate(X_test, y_test, verbose=0)
    mean_squared_errors.append(scores)
    print('Iteration: {}, Mean Squared Error: {}'.format(iter,scores))  

Iteration: 0, Mean Squared Error: 119.08858838127655
Iteration: 1, Mean Squared Error: 109.23927682425983
Iteration: 2, Mean Squared Error: 110.62815348307292
Iteration: 3, Mean Squared Error: 109.506158007773
Iteration: 4, Mean Squared Error: 110.95111237066078
Iteration: 5, Mean Squared Error: 110.39537579187683
Iteration: 6, Mean Squared Error: 111.58564662007453
Iteration: 7, Mean Squared Error: 111.63172287925548
Iteration: 8, Mean Squared Error: 114.71439361572266
Iteration: 9, Mean Squared Error: 122.90328643700066
Iteration: 10, Mean Squared Error: 114.82957367140884
Iteration: 11, Mean Squared Error: 110.25210813256915
Iteration: 12, Mean Squared Error: 115.31798600378931
Iteration: 13, Mean Squared Error: 108.96239891175699
Iteration: 14, Mean Squared Error: 112.43585402602902
Iteration: 15, Mean Squared Error: 113.01621298188145
Iteration: 16, Mean Squared Error: 129.0523644851635
Iteration: 17, Mean Squared Error: 114.20447221774499
Iteration: 18, Mean Squared Error: 109.01

In [16]:
import statistics 
m = statistics.mean(mean_squared_errors)
s = statistics.stdev(mean_squared_errors)
print("Report of Mean Squared Errors (Baseline)\nMean: {}, Standard deviation: {}".format(m,s))

Report of Mean Squared Errors (Baseline)
Mean: 114.67572975331525, Standard deviation: 6.178903255945665


## Part B. Train and Test the Network with Normalized data

In [17]:
X_train_norm, X_test_norm, y_train, y_test = model_selection.train_test_split(predictors_norm, target, train_size=0.70,test_size=0.30)

In [18]:
# build the model
model_norm = regression_model()

In [19]:
mean_squared_errors_norm = []
for iter in range(50):
    # fit the model
    model.fit(X_train_norm, y_train, validation_split=0.3, epochs=50, verbose=0)
    # evaluate the model
    scores = model.evaluate(X_test_norm, y_test, verbose=0)
    mean_squared_errors_norm.append(scores)
    print('Iteration: {}, Mean Squared Error: {}'.format(iter,scores)) 

Iteration: 0, Mean Squared Error: 404.2545988706323
Iteration: 1, Mean Squared Error: 174.58862521964755
Iteration: 2, Mean Squared Error: 142.48626062090727
Iteration: 3, Mean Squared Error: 131.04479698995942
Iteration: 4, Mean Squared Error: 122.42332710340185
Iteration: 5, Mean Squared Error: 115.98812071173708
Iteration: 6, Mean Squared Error: 111.66817985460597
Iteration: 7, Mean Squared Error: 108.74326354091608
Iteration: 8, Mean Squared Error: 106.14425330794745
Iteration: 9, Mean Squared Error: 103.24142428895031
Iteration: 10, Mean Squared Error: 99.1133181624428
Iteration: 11, Mean Squared Error: 93.13979897607106
Iteration: 12, Mean Squared Error: 85.99203007275233
Iteration: 13, Mean Squared Error: 78.95525436648273
Iteration: 14, Mean Squared Error: 73.2325966597375
Iteration: 15, Mean Squared Error: 69.21084014497528
Iteration: 16, Mean Squared Error: 66.61397394470413
Iteration: 17, Mean Squared Error: 62.99882401154651
Iteration: 18, Mean Squared Error: 60.26142453684

In [20]:
m_norm = statistics.mean(mean_squared_errors_norm)
s_norm = statistics.stdev(mean_squared_errors_norm)
print("Report of Mean Squared Errors (Normalized Data) \nMean: {}, Standard deviation: {}".format(m_norm,s_norm))

Report of Mean Squared Errors (Normalized Data) 
Mean: 76.87614821350691, Standard deviation: 55.5411597085841


### Comparison
Part B had a lower mean but higher standard deviation of mean squared errors than Part A.

## Part C. Increate the number of epochs 

In [21]:
# build the model
model = regression_model()
#Use 100 epochs
mean_squared_errors_norm_100_epochs = []
for iter in range(50):
    # fit the model
    model.fit(X_train_norm, y_train, validation_split=0.3, epochs=100, verbose=0)
    # evaluate the model
    scores = model.evaluate(X_test_norm, y_test, verbose=0)
    mean_squared_errors_norm_100_epochs.append(scores)
    print('Iteration: {}, Mean Squared Error: {}'.format(iter,scores)) 

Iteration: 0, Mean Squared Error: 174.25027367057925
Iteration: 1, Mean Squared Error: 126.54502700756282
Iteration: 2, Mean Squared Error: 92.8033596890644
Iteration: 3, Mean Squared Error: 68.95629132218346
Iteration: 4, Mean Squared Error: 60.86431026767373
Iteration: 5, Mean Squared Error: 58.946939425175245
Iteration: 6, Mean Squared Error: 57.725301489475086
Iteration: 7, Mean Squared Error: 56.81174465290551
Iteration: 8, Mean Squared Error: 56.097227744685796
Iteration: 9, Mean Squared Error: 55.54263443931407
Iteration: 10, Mean Squared Error: 55.23093930340122
Iteration: 11, Mean Squared Error: 54.82446785343503
Iteration: 12, Mean Squared Error: 54.509078769622114
Iteration: 13, Mean Squared Error: 54.48396095103045
Iteration: 14, Mean Squared Error: 54.34634654961744
Iteration: 15, Mean Squared Error: 54.30112040004298
Iteration: 16, Mean Squared Error: 54.27587928895426
Iteration: 17, Mean Squared Error: 54.19734491885287
Iteration: 18, Mean Squared Error: 54.2314000175994

In [22]:
m_norm = statistics.mean(mean_squared_errors_norm_100_epochs)
s_norm = statistics.stdev(mean_squared_errors_norm_100_epochs)
print("Report of Mean Squared Errors (Normalized Data Increased Epochs) \nMean: {}, Standard deviation: {}".format(m_norm,s_norm))

Report of Mean Squared Errors (Normalized Data Increased Epochs) 
Mean: 58.87998708459552, Standard deviation: 20.404096566978357


### Comparison
Part C had a low mean and standard deviation of mean squared errors than Part B.

## Part D. Increase the number of hidden layers to 3

In [23]:
# define regression model
def regression_model_3_hidden():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [24]:
# build the model
model_norm_3_hidden = regression_model_3_hidden()

In [25]:
mean_squared_errors_norm_3_hidden = []
for iter in range(50):
    # fit the model
    model_norm_3_hidden.fit(X_train_norm, y_train, validation_split=0.3, epochs=50, verbose=0)
    # evaluate the model
    scores_3_hidden = model_norm_3_hidden.evaluate(X_test_norm, y_test, verbose=0)
    mean_squared_errors_norm_3_hidden.append(scores)
    print('Iteration: {}, Mean Squared Error: {}'.format(iter,scores_3_hidden)) 

Iteration: 0, Mean Squared Error: 119.03568640650283
Iteration: 1, Mean Squared Error: 92.4856045423588
Iteration: 2, Mean Squared Error: 84.75809762701633
Iteration: 3, Mean Squared Error: 79.66866855559611
Iteration: 4, Mean Squared Error: 74.55992923971132
Iteration: 5, Mean Squared Error: 58.7219694810392
Iteration: 6, Mean Squared Error: 50.56752114157075
Iteration: 7, Mean Squared Error: 48.236611684163414
Iteration: 8, Mean Squared Error: 46.93101792813891
Iteration: 9, Mean Squared Error: 46.07208142079968
Iteration: 10, Mean Squared Error: 45.21188890278147
Iteration: 11, Mean Squared Error: 43.99983394338861
Iteration: 12, Mean Squared Error: 43.67819217570777
Iteration: 13, Mean Squared Error: 43.515211062138135
Iteration: 14, Mean Squared Error: 42.958496513490154
Iteration: 15, Mean Squared Error: 42.59388044042495
Iteration: 16, Mean Squared Error: 42.30062560047532
Iteration: 17, Mean Squared Error: 42.207236404172036
Iteration: 18, Mean Squared Error: 40.88560598182061


In [26]:
m_norm_3_hidden = statistics.mean(mean_squared_errors_norm_3_hidden)
s_norm_3_hidden = statistics.stdev(mean_squared_errors_norm_3_hidden)
print("Report of Mean Squared Errors (Normalized Data, 3 Hidden Layers) \nMean: {}, Standard deviation: {}".format(m_norm_3_hidden,s_norm_3_hidden))

Report of Mean Squared Errors (Normalized Data, 3 Hidden Layers) 
Mean: 52.49241435720697, Standard deviation: 0.0


### Comparison
Part D had a lower mean and standard deviation of mean squared errors than Part B.