## Regression Models with Keras

### Objective for this Notebook
1. How to use the Keras library to build a regression model.
2. Download and Clean dataset 
3. Build a Neural Network 
4. Train and Test the Network. 

### Table of Contents

1. Download and Clean Dataset
2. Import Keras
3. Build a Neural Network
4. Train and Test the Network

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

### Load and Clean the Data

In [24]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense

In [6]:
#df = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
#df.head()

In [8]:
df = pd.read_csv('concrete_data.csv')
df.head()

Unnamed: 0,cement,blast_furnace_slag,fly_ash,water,superplasticizer,coarse_aggregate,fine_aggregate,age,concrete_compressive_strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


### Check how many data point

In [10]:
print('Total Number of Row: {}'.format(df.shape[0]))
print('Total Number of Column: {}'.format(df.shape[1]))

Total Number of Row: 1030
Total Number of Column: 9


#### Check the dataset for any missing values.

In [12]:
df.describe()

Unnamed: 0,cement,blast_furnace_slag,fly_ash,water,superplasticizer,coarse_aggregate,fine_aggregate,age,concrete_compressive_strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


# df.isnull().sum()

#### Split data into predictors and target

In [15]:
df_columns = df.columns
df_columns

Index(['cement', 'blast_furnace_slag', 'fly_ash', 'water', 'superplasticizer',
       'coarse_aggregate', 'fine_aggregate ', 'age',
       'concrete_compressive_strength'],
      dtype='object')

#### Predictor/Feature Values

In [17]:
predictors = df[df_columns[df_columns != 'concrete_compressive_strength']]

In [18]:
predictors.head()

Unnamed: 0,cement,blast_furnace_slag,fly_ash,water,superplasticizer,coarse_aggregate,fine_aggregate,age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


#### Target Value

In [19]:
target = df['concrete_compressive_strength']

In [20]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: concrete_compressive_strength, dtype: float64

#### Normalize the data by substracting the mean and dividing by the standard deviation

In [21]:
predictors_norm = (predictors - predictors.mean())/predictors.std()
predictors_norm

Unnamed: 0,cement,blast_furnace_slag,fly_ash,water,superplasticizer,coarse_aggregate,fine_aggregate,age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.795140,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.551340
3,0.491187,0.795140,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069
...,...,...,...,...,...,...,...,...
1025,-0.045623,0.487998,0.564271,-0.092126,0.451190,-1.322363,-0.065861,-0.279597
1026,0.392628,-0.856472,0.959602,0.675872,0.702285,-1.993711,0.496651,-0.279597
1027,-1.269472,0.759210,0.850222,0.521336,-0.017520,-1.035561,0.080068,-0.279597
1028,-1.168042,1.307430,-0.846733,-0.279443,0.852942,0.214537,0.191074,-0.279597


#### Save the number of predictors to n_cols

In [25]:
#Number of predictors
n_cols = predictors_norm.shape[1]

In [26]:
# define regression model
def regression_model():
    #Create model
    model = Sequential()
    model.add(Dense(50, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    
    #Compel model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

### Train and Test the Network

In [28]:
#Build the model
model = regression_model()
#Fit the model
model.fit(predictors_norm, target, validation_split=0.3, epochs=100, verbose=2)

Epoch 1/100
23/23 - 4s - loss: 1653.7710 - val_loss: 1139.6996 - 4s/epoch - 184ms/step
Epoch 2/100
23/23 - 0s - loss: 1524.6826 - val_loss: 1021.2117 - 133ms/epoch - 6ms/step
Epoch 3/100
23/23 - 0s - loss: 1315.7744 - val_loss: 835.3539 - 136ms/epoch - 6ms/step
Epoch 4/100
23/23 - 0s - loss: 992.4383 - val_loss: 601.1056 - 127ms/epoch - 6ms/step
Epoch 5/100
23/23 - 0s - loss: 631.5583 - val_loss: 366.2721 - 113ms/epoch - 5ms/step
Epoch 6/100
23/23 - 0s - loss: 354.0918 - val_loss: 228.6462 - 99ms/epoch - 4ms/step
Epoch 7/100
23/23 - 0s - loss: 252.6452 - val_loss: 182.8925 - 153ms/epoch - 7ms/step
Epoch 8/100
23/23 - 0s - loss: 229.9271 - val_loss: 169.2846 - 134ms/epoch - 6ms/step
Epoch 9/100
23/23 - 0s - loss: 214.2853 - val_loss: 161.4551 - 117ms/epoch - 5ms/step
Epoch 10/100
23/23 - 0s - loss: 203.3293 - val_loss: 155.3286 - 129ms/epoch - 6ms/step
Epoch 11/100
23/23 - 0s - loss: 193.7743 - val_loss: 153.9114 - 117ms/epoch - 5ms/step
Epoch 12/100
23/23 - 0s - loss: 187.8203 - val_lo

<keras.callbacks.History at 0x161011b96c0>