

<h1 align=center><font size = 5>Regression Models with Keras</font></h1>


# Regression project

<a id="item31"></a>


## Download and Clean Dataset


Let's start by importing the <em>pandas</em> and the Numpy libraries.


In [53]:
# All Libraries required for this lab are listed below. The libraries pre-installed on Skills Network Labs are commented. 
# If you run this notebook on a different environment, e.g. your desktop, you may need to uncomment and install certain libraries.

#!pip install numpy==1.21.4
#!pip install pandas==1.3.4
#!pip install keras==2.1.6

In [54]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split


<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>


Let's download the data and read it into a <em>pandas</em> dataframe.


In [55]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


#### Let's check how many data points we have.


In [56]:
concrete_data.shape

(1030, 9)

Let's check the dataset for any missing values.


In [57]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [58]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.


#### Split data into predictors and target


The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.


In [59]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

<a id="item2"></a>


Let's do a quick sanity check of the predictors and the target dataframes.


In [60]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [61]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.


Let's save the number of predictors to *n_cols* since we will need this number when building our network.


In [62]:
n_cols = predictors.shape[1] # number of predictors

<a id="item1"></a>


<a id='item32'></a>


## Import Keras


In [63]:
import keras

In [64]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>


## Build a Neural Network


Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.


In [65]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    #model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function create a model that has two hidden layers, each of 50 hidden units.


<a id="item4"></a>


<a id='item34'></a>


## Train and Test the Network


Let's call the function now to create our model.


In [66]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 30% of the data for validation and we will train the model for 100 epochs.


In [67]:
list_of_mean_squared_error = []
for cycle in range(50):
    #Randomly split the data into a training set (70%) and a test set (30%):  
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)
    #Train and test the model at the same time
    res = model.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))
    #Find mean_squared_error as last value in history.
    mean_squared_error = res.history['val_loss'][-1]
    #Add value of mean_squared_error for every cycle in list.
    list_of_mean_squared_error.append(mean_squared_error)
    print('Cycle #{}: mean_squared_error {}'.format(cycle+1, mean_squared_error))

Cycle #1: mean_squared_error 1009.8582198751011
Cycle #2: mean_squared_error 261.10693922320615
Cycle #3: mean_squared_error 140.62821395034544
Cycle #4: mean_squared_error 132.19264781667962
Cycle #5: mean_squared_error 127.49585353517995
Cycle #6: mean_squared_error 115.54036594131618
Cycle #7: mean_squared_error 117.56164000180932
Cycle #8: mean_squared_error 117.03228485699996
Cycle #9: mean_squared_error 128.51724919686424
Cycle #10: mean_squared_error 112.42986700064156
Cycle #11: mean_squared_error 115.65251372007104
Cycle #12: mean_squared_error 124.87427231871966
Cycle #13: mean_squared_error 100.98380398055882
Cycle #14: mean_squared_error 98.15874305897931
Cycle #15: mean_squared_error 123.15417915023261
Cycle #16: mean_squared_error 102.59011198864786
Cycle #17: mean_squared_error 128.65243056213973
Cycle #18: mean_squared_error 112.3562950208349
Cycle #19: mean_squared_error 97.58293803687235
Cycle #20: mean_squared_error 101.20116061608768
Cycle #21: mean_squared_error 11

In [68]:
# Create a DataFrame to display the mean squared error values in a table
df = pd.DataFrame({'Cycle #': range(1, 51), 'Mean Squared Error': list_of_mean_squared_error})
print(df)

    Cycle #  Mean Squared Error
0         1         1009.858220
1         2          261.106939
2         3          140.628214
3         4          132.192648
4         5          127.495854
5         6          115.540366
6         7          117.561640
7         8          117.032285
8         9          128.517249
9        10          112.429867
10       11          115.652514
11       12          124.874272
12       13          100.983804
13       14           98.158743
14       15          123.154179
15       16          102.590112
16       17          128.652431
17       18          112.356295
18       19           97.582938
19       20          101.201161
20       21          113.189056
21       22           80.117280
22       23           69.519097
23       24           64.521720
24       25           53.611504
25       26           50.362337
26       27           60.593435
27       28           54.051388
28       29           50.989902
29       30           51.430613
30      

In [69]:
print('The mean of the M2E: {}'.format(np.mean(list_of_mean_squared_error)))
print('The standard deviation of the M2E: {}'.format(np.std(list_of_mean_squared_error)))

The mean of the M2E: 100.84555260803323
The standard deviation of the M2E: 135.99986494882347
