<a href="https://colab.research.google.com/github/n1az/Intro-to-deep-learning-with-keras/blob/main/Week%205/Final%20Assignment/Peer-graded%20Assignment%3A%20Build%20a%20Regression%20Model%20in%20Keras%20(C).ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Download and Clean Dataset

Let's start by importing the <em>pandas</em> and the Numpy libraries.

In [22]:
import pandas as pd
import numpy as np

We will be using the dataset provided in the assignment

<strong>The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:</strong>

<strong>1. Cement</strong>

<strong>2. Blast Furnace Slag</strong>

<strong>3. Fly Ash</strong>

<strong>4. Water</strong>

<strong>5. Superplasticizer</strong>

<strong>6. Coarse Aggregate</strong>

<strong>7. Fine Aggregate</strong>

Let's read the dataset into a <em>pandas</em> dataframe.

In [23]:
concrete_data = pd.read_csv('sample_data/concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


So the first concrete sample has 540 cubic meter of cement, 0 cubic meter of blast furnace slag, 0 cubic meter of fly ash, 162 cubic meter of water, 2.5 cubic meter of superplaticizer, 1040 cubic meter of coarse aggregate, 676 cubic meter of fine aggregate. Such a concrete mix which is 28 days old, has a compressive strength of 79.99 MPa. 

#### Let's check how many data points we have.

In [24]:
concrete_data.shape

(1030, 9)

So, there are approximately 1000 samples to train our model on. Because of the few samples, we have to be careful not to overfit the training data.

Let's check the dataset for any missing values.

In [25]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [26]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The data looks very clean and is ready to be used to build our model.

#### Split data into predictors and target

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [27]:
concrete_data_columns = concrete_data.columns
predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Let's do a quick sanity check of the predictors and the target dataframes.

In [28]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [29]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Finally, the last step is to normalize the data by substracting the mean and dividing by the standard deviation.

In [30]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


In [31]:
n_cols = predictors_norm.shape[1] # number of predictors
n_cols

8

<a id="item1"></a>

<a id="item1"></a>

## Import Keras

#### Let's go ahead and import the Keras library

In [32]:
import keras

As you can see, the TensorFlow backend was used to install the Keras library.

Let's import the rest of the packages from the Keras library that we will need to build our regressoin model.

In [33]:
from keras.models import Sequential
from keras.layers import Dense

In [34]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The above function creates a model that has one hidden layer with 10 neurons and a ReLU activation function. It uses the adam optimizer and the mean squared error as the loss function.

Let's import scikit-learn in order to randomly split the data into a training and test sets

In [35]:
from sklearn.model_selection import train_test_split

Splitting the data into a training and test sets by holding 30% of the data for testing

In [36]:
X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=42)

## Train and Test the Network

Let's call the function now to create our model.

In [37]:
# build the model
model = regression_model()

Next, we will train the model for 50 epochs.


In [38]:
# fit the model
epochs = 100
model.fit(X_train, y_train, epochs=epochs, verbose=2)

Epoch 1/100
23/23 - 1s - loss: 1689.9802
Epoch 2/100
23/23 - 0s - loss: 1671.0020
Epoch 3/100
23/23 - 0s - loss: 1653.5482
Epoch 4/100
23/23 - 0s - loss: 1637.5800
Epoch 5/100
23/23 - 0s - loss: 1622.6855
Epoch 6/100
23/23 - 0s - loss: 1608.7137
Epoch 7/100
23/23 - 0s - loss: 1595.3557
Epoch 8/100
23/23 - 0s - loss: 1582.5408
Epoch 9/100
23/23 - 0s - loss: 1570.0693
Epoch 10/100
23/23 - 0s - loss: 1557.7434
Epoch 11/100
23/23 - 0s - loss: 1545.4568
Epoch 12/100
23/23 - 0s - loss: 1533.3027
Epoch 13/100
23/23 - 0s - loss: 1520.9480
Epoch 14/100
23/23 - 0s - loss: 1508.5317
Epoch 15/100
23/23 - 0s - loss: 1495.9744
Epoch 16/100
23/23 - 0s - loss: 1483.0115
Epoch 17/100
23/23 - 0s - loss: 1469.6543
Epoch 18/100
23/23 - 0s - loss: 1455.7959
Epoch 19/100
23/23 - 0s - loss: 1441.4395
Epoch 20/100
23/23 - 0s - loss: 1426.4038
Epoch 21/100
23/23 - 0s - loss: 1410.5078
Epoch 22/100
23/23 - 0s - loss: 1393.8237
Epoch 23/100
23/23 - 0s - loss: 1376.5818
Epoch 24/100
23/23 - 0s - loss: 1358.2332
E

<keras.callbacks.History at 0x7f5eaa609610>

Next we need to evaluate the model on the test data.

In [39]:
loss_val = model.evaluate(X_test, y_test)
y_pred = model.predict(X_test)
loss_val



228.3919677734375

Now we need to compute the mean squared error between the predicted concrete strength and the actual concrete strength.

Let's import the mean_squared_error function from Scikit-learn.

In [40]:
from sklearn.metrics import mean_squared_error

In [41]:
mean_square_error = mean_squared_error(y_test, y_pred)
mean = np.mean(mean_square_error)
standard_deviation = np.std(mean_square_error)
print(mean, standard_deviation)

228.39195853600395 0.0


Create a list of 50 mean squared errors and report mean and the standard deviation of the mean squared errors.

In [42]:
total_mean_squared_errors = 50
epochs = 100
mean_squared_errors = []
for i in range(0, total_mean_squared_errors):
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3, random_state=i)
    model.fit(X_train, y_train, epochs=epochs, verbose=0)
    MSE = model.evaluate(X_test, y_test, verbose=0)
    print("MSE "+str(i+1)+": "+str(MSE))
    y_pred = model.predict(X_test)
    mean_square_error = mean_squared_error(y_test, y_pred)
    mean_squared_errors.append(mean_square_error)

mean_squared_errors = np.array(mean_squared_errors)
mean = np.mean(mean_squared_errors)
standard_deviation = np.std(mean_squared_errors)

print('\n')
print("Below is the mean and standard deviation of " +str(total_mean_squared_errors) + " mean squared errors with normalized data. Total number of epochs for each training is: " +str(epochs) + "\n")
print("Mean: "+str(mean))
print("Standard Deviation: "+str(standard_deviation))

MSE 1: 86.2396469116211
MSE 2: 68.69584655761719
MSE 3: 40.30857849121094
MSE 4: 39.252750396728516
MSE 5: 37.28645706176758
MSE 6: 37.54557800292969
MSE 7: 38.41786193847656
MSE 8: 30.55942153930664
MSE 9: 31.367332458496094
MSE 10: 32.08747482299805
MSE 11: 31.031707763671875
MSE 12: 29.167804718017578
MSE 13: 32.37705612182617
MSE 14: 36.53057861328125
MSE 15: 30.395097732543945
MSE 16: 24.485706329345703
MSE 17: 28.658761978149414
MSE 18: 30.03765296936035
MSE 19: 25.50555992126465
MSE 20: 29.388710021972656
MSE 21: 28.091920852661133
MSE 22: 28.231853485107422
MSE 23: 25.38604164123535
MSE 24: 27.997488021850586
MSE 25: 29.055313110351562
MSE 26: 31.912254333496094
MSE 27: 26.39688491821289
MSE 28: 26.033781051635742
MSE 29: 30.262388229370117
MSE 30: 26.088159561157227
MSE 31: 25.927000045776367
MSE 32: 24.29895782470703
MSE 33: 24.512470245361328
MSE 34: 25.799686431884766
MSE 35: 27.330236434936523
MSE 36: 31.362218856811523
MSE 37: 23.67302131652832
MSE 38: 28.24854278564453
M

So we got mean value 30.963 which is less than the previous value 53.087 and std is now 10.5 which was 25.935 previously.