# Build a Regression Model in Keras

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

## Download and Clean Dataset

### Let's start by importing the pandas and the Numpy libraries.

In [1]:
#importing libraries
import pandas as pd
import numpy as np
from sklearn import preprocessing

In [2]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
#Let's check how many data points we have
concrete_data.shape

(1030, 9)

In [4]:
#Let's check the dataset for any missing values
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [5]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## Split data

In [6]:
concrete_data_columns = concrete_data.columns
X = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
X[0:5]

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [7]:
y = concrete_data['Strength']
y[0:5]

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

## train_test_split

#### Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.

In [8]:
# Randomly split the data into a training and test sets by holding 30% of the data for testing. 
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.3, random_state=4)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

Train set: (721, 8) (721,)
Test set: (309, 8) (309,)


In [9]:
n_cols = X.shape[1]

## Import Keras and Build a Neural Network

In [10]:
#importing keras
import keras

from keras.models import Sequential
from keras.layers import Dense

Using TensorFlow backend.


In [11]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

## A) Without Normalising
### Train and Test the Network 

In [12]:
# build the model
model = regression_model()
model.fit(X, y, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 0s - loss: 11894.2931 - val_loss: 5296.3416
Epoch 2/50
 - 0s - loss: 4255.3807 - val_loss: 3038.9604
Epoch 3/50
 - 0s - loss: 2784.1707 - val_loss: 2006.0850
Epoch 4/50
 - 0s - loss: 1922.3995 - val_loss: 1344.3142
Epoch 5/50
 - 0s - loss: 1343.8554 - val_loss: 882.9199
Epoch 6/50
 - 0s - loss: 940.7072 - val_loss: 580.7288
Epoch 7/50
 - 0s - loss: 669.1725 - val_loss: 392.1823
Epoch 8/50
 - 0s - loss: 504.7656 - val_loss: 292.0699
Epoch 9/50
 - 0s - loss: 418.9829 - val_loss: 252.0972
Epoch 10/50
 - 0s - loss: 376.4648 - val_loss: 242.8946
Epoch 11/50
 - 0s - loss: 362.7223 - val_loss: 240.8482
Epoch 12/50
 - 0s - loss: 358.5357 - val_loss: 242.9572
Epoch 13/50
 - 0s - loss: 357.6861 - val_loss: 242.8407
Epoch 14/50
 - 0s - loss: 356.7823 - val_loss: 240.4671
Epoch 15/50
 - 0s - loss: 356.2434 - val_loss: 239.2432
Epoch 16/50
 - 0s - loss: 355.7295 - val_loss: 237.7194
Epoch 17/50
 - 0s - loss: 355.5799 - val_loss: 234.7848
E

<keras.callbacks.History at 0x7ff623ba0dd8>

## B) Normalising the data

In [13]:
X_norm = (X - X.mean()) / X.std()
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [14]:
n_cols = X_norm.shape[1]

In [15]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [16]:
# build the model
model = regression_model()
model.fit(X_norm, y, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 0s - loss: 1690.3133 - val_loss: 1225.8280
Epoch 2/50
 - 0s - loss: 1673.2019 - val_loss: 1212.6934
Epoch 3/50
 - 0s - loss: 1652.2956 - val_loss: 1195.5254
Epoch 4/50
 - 0s - loss: 1623.3827 - val_loss: 1170.9396
Epoch 5/50
 - 0s - loss: 1582.5234 - val_loss: 1135.7834
Epoch 6/50
 - 0s - loss: 1524.1843 - val_loss: 1089.3774
Epoch 7/50
 - 0s - loss: 1445.6546 - val_loss: 1033.0638
Epoch 8/50
 - 0s - loss: 1349.4637 - val_loss: 969.2019
Epoch 9/50
 - 0s - loss: 1240.7683 - val_loss: 897.5668
Epoch 10/50
 - 0s - loss: 1119.6058 - val_loss: 820.5203
Epoch 11/50
 - 0s - loss: 992.1742 - val_loss: 738.3292
Epoch 12/50
 - 0s - loss: 863.2795 - val_loss: 655.8260
Epoch 13/50
 - 0s - loss: 740.7022 - val_loss: 575.6974
Epoch 14/50
 - 0s - loss: 629.6087 - val_loss: 502.6178
Epoch 15/50
 - 0s - loss: 535.5789 - val_loss: 437.6821
Epoch 16/50
 - 0s - loss: 460.5111 - val_loss: 380.7139
Epoch 17/50
 - 0s - loss: 399.8883 - val_loss: 334

<keras.callbacks.History at 0x7ff6204d89b0>

## C) 100 epochs this time for training
#### With normalised data

In [17]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [18]:
# build the model
model = regression_model()
model.fit(X_norm, y, validation_split=0.3, epochs=100, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/100
 - 0s - loss: 1627.2076 - val_loss: 1155.8545
Epoch 2/100
 - 0s - loss: 1589.1035 - val_loss: 1125.4691
Epoch 3/100
 - 0s - loss: 1543.8601 - val_loss: 1090.2297
Epoch 4/100
 - 0s - loss: 1487.7863 - val_loss: 1046.5363
Epoch 5/100
 - 0s - loss: 1416.8832 - val_loss: 992.7140
Epoch 6/100
 - 0s - loss: 1328.5656 - val_loss: 926.1943
Epoch 7/100
 - 0s - loss: 1219.0376 - val_loss: 849.6304
Epoch 8/100
 - 0s - loss: 1092.5545 - val_loss: 765.6094
Epoch 9/100
 - 0s - loss: 957.2875 - val_loss: 675.7459
Epoch 10/100
 - 0s - loss: 815.9907 - val_loss: 585.6687
Epoch 11/100
 - 0s - loss: 683.1740 - val_loss: 499.5068
Epoch 12/100
 - 0s - loss: 563.9935 - val_loss: 425.5442
Epoch 13/100
 - 0s - loss: 467.3314 - val_loss: 365.1650
Epoch 14/100
 - 0s - loss: 395.4746 - val_loss: 316.6543
Epoch 15/100
 - 0s - loss: 343.7804 - val_loss: 282.3537
Epoch 16/100
 - 0s - loss: 310.7965 - val_loss: 256.4466
Epoch 17/100
 - 0s - loss: 286.3197 - v

<keras.callbacks.History at 0x7ff6205f1278>

## D) Three hidden layers, each of 10 nodes and ReLU activation function.
#### With Normalised Data

In [19]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [20]:
# build the model
model = regression_model()
model.fit(X_norm, y, validation_split=0.3, epochs=50, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/50
 - 1s - loss: 1696.5964 - val_loss: 1219.9570
Epoch 2/50
 - 0s - loss: 1673.7397 - val_loss: 1196.9527
Epoch 3/50
 - 0s - loss: 1629.0284 - val_loss: 1150.3605
Epoch 4/50
 - 0s - loss: 1536.2743 - val_loss: 1061.7384
Epoch 5/50
 - 0s - loss: 1347.7012 - val_loss: 873.6201
Epoch 6/50
 - 0s - loss: 971.2153 - val_loss: 572.5362
Epoch 7/50
 - 0s - loss: 535.9411 - val_loss: 334.3912
Epoch 8/50
 - 0s - loss: 345.5570 - val_loss: 259.6106
Epoch 9/50
 - 0s - loss: 283.1195 - val_loss: 219.4822
Epoch 10/50
 - 0s - loss: 247.9980 - val_loss: 198.2582
Epoch 11/50
 - 0s - loss: 227.7304 - val_loss: 184.7591
Epoch 12/50
 - 0s - loss: 214.7638 - val_loss: 176.1646
Epoch 13/50
 - 0s - loss: 205.5443 - val_loss: 169.3477
Epoch 14/50
 - 0s - loss: 197.9263 - val_loss: 163.6372
Epoch 15/50
 - 0s - loss: 191.3430 - val_loss: 160.6544
Epoch 16/50
 - 0s - loss: 186.0452 - val_loss: 157.0950
Epoch 17/50
 - 0s - loss: 181.4257 - val_loss: 152.9582
Ep

<keras.callbacks.History at 0x7ff5a9edefd0>

##  Increasing Hidden Layes, Nodes and Epochs
#### With Normalised Data

In [22]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(30, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(10, activation='relu'))
    model.add(Dense(30, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

# build the model
model = regression_model()
model.fit(X_norm, y, validation_split=0.3, epochs=100, verbose=2)

Train on 721 samples, validate on 309 samples
Epoch 1/100
 - 1s - loss: 1683.1973 - val_loss: 1180.6225
Epoch 2/100
 - 0s - loss: 1441.6760 - val_loss: 664.1815
Epoch 3/100
 - 0s - loss: 526.0031 - val_loss: 204.3549
Epoch 4/100
 - 0s - loss: 279.4227 - val_loss: 175.1610
Epoch 5/100
 - 0s - loss: 228.6384 - val_loss: 163.1331
Epoch 6/100
 - 0s - loss: 203.5048 - val_loss: 188.2185
Epoch 7/100
 - 0s - loss: 188.3749 - val_loss: 159.6254
Epoch 8/100
 - 0s - loss: 177.3933 - val_loss: 156.6323
Epoch 9/100
 - 0s - loss: 162.3253 - val_loss: 157.1256
Epoch 10/100
 - 0s - loss: 154.2437 - val_loss: 165.0306
Epoch 11/100
 - 0s - loss: 141.2857 - val_loss: 180.1564
Epoch 12/100
 - 0s - loss: 128.5872 - val_loss: 165.6932
Epoch 13/100
 - 0s - loss: 117.9048 - val_loss: 169.1131
Epoch 14/100
 - 0s - loss: 108.2547 - val_loss: 192.1481
Epoch 15/100
 - 0s - loss: 96.8708 - val_loss: 182.0184
Epoch 16/100
 - 0s - loss: 87.5545 - val_loss: 220.1177
Epoch 17/100
 - 0s - loss: 79.0075 - val_loss: 235

<keras.callbacks.History at 0x7ff5a8f33160>