<a id="item31"></a>


## Analysing the concerete data set using Regression NN model 


Let's start by importing the <em>pandas</em> and the Numpy libraries.


In [208]:
import pandas as pd
import numpy as np

We will be playing around with the same dataset that we used in the videos.

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>


Let's download the data and read it into a <em>pandas</em> dataframe.


In [209]:
concrete_data = pd.read_csv('https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv')
concrete_data.head()
#print(type(concrete_data))

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa. 


#### Let's check how many data points we have.


In [210]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.


Let's check the dataset for any missing values.


In [211]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [212]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.


#### Split data into predictors and target


The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.


In [213]:
df_columns = concrete_data.columns

#predictors = concrete_data.loc[df_columns[concrete_data_columns != 'Strength']] # all columns except Strength
predictors = concrete_data.loc[:,df_columns != 'Strength']
print(predictors.shape)
target = concrete_data['Strength'] # Strength column
print(target.shape)

(1030, 8)
(1030,)


<a id="item2"></a>


Let's do a quick sanity check of the predictors and the target dataframes.


In [214]:
predictors.head()
#predictors.mean()
#print(predictors.std())

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [215]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

In [216]:
#Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()


Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [217]:
n_cols = predictors_norm.shape[1] # number of predictors


<a id="item1"></a>


<a id='item32'></a>


## Import Keras


Recall from the videos that Keras normally runs on top of a low-level library such as TensorFlow. This means that to be able to use the Keras library, you will have to install TensorFlow first and when you import the Keras library, it will be explicitly displayed what backend was used to install the Keras library. In CC Labs, we used TensorFlow as the backend to install Keras, so it should clearly print that when we import Keras.


#### Let's go ahead and import the Keras library


In [218]:
import keras

As you can see, the TensorFlow backend was used to install the Keras library.


Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.


In [219]:
from keras.models import Sequential
from keras.layers import Dense

<a id='item33'></a>


## Build a Neural Network


Let's define a function that defines our regression model for us so that we can conveniently call it to create our model.


In [220]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(ncols,)))
    #model.add(Dense(50, activation='relu'))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function create a model that has one hidden layers, each of 10 hidden units.


<a id="item4"></a>


<a id='item34'></a>


## Train and Test the Network


Let's Split data into training and test sets.


In [221]:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

print(predictors_norm)

        Cement  Blast Furnace Slag   Fly Ash     Water  Superplasticizer  \
0     2.476712           -0.856472 -0.846733 -0.916319         -0.620147   
1     2.476712           -0.856472 -0.846733 -0.916319         -0.620147   
2     0.491187            0.795140 -0.846733  2.174405         -1.038638   
3     0.491187            0.795140 -0.846733  2.174405         -1.038638   
4    -0.790075            0.678079 -0.846733  0.488555         -1.038638   
...        ...                 ...       ...       ...               ...   
1025 -0.045623            0.487998  0.564271 -0.092126          0.451190   
1026  0.392628           -0.856472  0.959602  0.675872          0.702285   
1027 -1.269472            0.759210  0.850222  0.521336         -0.017520   
1028 -1.168042            1.307430 -0.846733 -0.279443          0.852942   
1029 -0.193939            0.308349  0.376762  0.891286          0.400971   

      Coarse Aggregate  Fine Aggregate       Age  
0             0.862735       -1.2170

In [222]:
#split the data into training and test set 
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.30, random_state=42)

In [223]:
print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)

(721, 8)
(309, 8)
(721,)
(309,)


In [224]:
# build the model
model = regression_model()

Next, we will train and test the model at the same time using the *fit* method. We will leave out 30% of the data for validation and we will train the model for 50 epochs.


In [226]:
# fit the model
model.fit(X_train, y_train, validation_split=0.3, epochs=100, verbose=2)

Train on 504 samples, validate on 217 samples
Epoch 1/100
 - 0s - loss: 772.2417 - val_loss: 717.2905
Epoch 2/100
 - 0s - loss: 750.0598 - val_loss: 697.1669
Epoch 3/100
 - 0s - loss: 728.3811 - val_loss: 676.9986
Epoch 4/100
 - 0s - loss: 707.0036 - val_loss: 657.2732
Epoch 5/100
 - 0s - loss: 685.8701 - val_loss: 637.7840
Epoch 6/100
 - 0s - loss: 665.1711 - val_loss: 618.9713
Epoch 7/100
 - 0s - loss: 645.3066 - val_loss: 600.3173
Epoch 8/100
 - 0s - loss: 625.2896 - val_loss: 582.3871
Epoch 9/100
 - 0s - loss: 606.2241 - val_loss: 564.4461
Epoch 10/100
 - 0s - loss: 587.4276 - val_loss: 546.9002
Epoch 11/100
 - 0s - loss: 569.2217 - val_loss: 529.7658
Epoch 12/100
 - 0s - loss: 551.5859 - val_loss: 512.8090
Epoch 13/100
 - 0s - loss: 534.1836 - val_loss: 496.8550
Epoch 14/100
 - 0s - loss: 517.5500 - val_loss: 481.1758
Epoch 15/100
 - 0s - loss: 501.4475 - val_loss: 465.8988
Epoch 16/100
 - 0s - loss: 485.8761 - val_loss: 450.7315
Epoch 17/100
 - 0s - loss: 470.2543 - val_loss: 436

<keras.callbacks.History at 0x7f9e981599d0>

In [227]:
#evaludate the model with Mean_squared_error
model.evaluate(X_test, y_test, verbose=0)

#calculate MSE for the model 
y_pred=model.predict(X_test)
print(mean_squared_error(y_test, y_pred))

142.6932283163532


In [228]:
# Train the model with epoc=50 & calcute the MSE for each epoch 
num_epochs = 100
mse_list = []
for epoch in range(num_epochs):
    # Train the model on the training data
    model.fit(X_train, y_train, epochs=1, verbose=0)

    # Make predictions on the validation set
    y_pred = model.predict(X_test)

    # Calculate mean squared error for the current epoch
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)
    
    print(f'Epoch {epoch + 1}/{num_epochs}, MSE: {mse}')

Epoch 1/100, MSE: 142.00571982732868
Epoch 2/100, MSE: 141.3941825368123
Epoch 3/100, MSE: 140.65495775479246
Epoch 4/100, MSE: 140.2515303447833
Epoch 5/100, MSE: 139.54931663089837
Epoch 6/100, MSE: 138.8769190125961
Epoch 7/100, MSE: 138.14319425782958
Epoch 8/100, MSE: 137.44995068842806
Epoch 9/100, MSE: 136.80637426934948
Epoch 10/100, MSE: 136.21542309511136
Epoch 11/100, MSE: 135.51901443084085
Epoch 12/100, MSE: 134.81598003870755
Epoch 13/100, MSE: 134.19254547569565
Epoch 14/100, MSE: 133.3858177785203
Epoch 15/100, MSE: 132.81730926167384
Epoch 16/100, MSE: 132.03181374822634
Epoch 17/100, MSE: 131.51646858301154
Epoch 18/100, MSE: 130.71563144488292
Epoch 19/100, MSE: 130.09308896070544
Epoch 20/100, MSE: 129.67902429919235
Epoch 21/100, MSE: 128.90530050018444
Epoch 22/100, MSE: 128.16935077218392
Epoch 23/100, MSE: 127.8179520774663
Epoch 24/100, MSE: 126.91762915798097
Epoch 25/100, MSE: 126.22379573723326
Epoch 26/100, MSE: 125.47932603501245
Epoch 27/100, MSE: 124.654

In [229]:
# Calculate mean and standard deviation of MSEs

mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)
print(f'Mean MSE: {mean_mse}')
print(f'Standard Deviation of MSE: {std_mse}')

Mean MSE: 112.79675983490043
Standard Deviation of MSE: 15.356026451474316
