# Course project - Deep Learning & Neural Networks with Keras

The assignment: In this course project, you will build a regression model using the deep learning Keras library, and then you will experiment with increasing the number of training epochs and changing number of hidden layers and you will see how changing these parameters impacts the performance of the model.


### Concrete Data:

For your convenience, the data can be found here again: https://cocl.us/concrete_data. To recap, the predictors in the data of concrete strength include:

1. Cement
2. Blast Furnace Slag
3. Fly Ash
4. Water
5. Superplasticizer
6. Coarse Aggregate
7. Fine Aggregate


### Assignment Instructions:

#### [A](#A). Build a baseline model

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

#### [B](#B). Normalize the data

Repeat Part A but use a normalized version of the data. Recall that one way to normalize the data is by subtracting the mean from the individual predictors and dividing by the standard deviation.

How does the mean of the mean squared errors compare to that from Step A?

#### [C](#C). Increate the number of epochs

Repeat Part B but use 100 epochs this time for training.

How does the mean of the mean squared errors compare to that from Step B?

#### [D](#D). Increase the number of hidden layers

Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

How does the mean of the mean squared errors compare to that from Step B?



### How to submit:
You will need to submit your code for each part in a Jupyter Notebook. Since each part builds on the previous one, you can submit the same notebook four times for grading. Please make sure that you:

- use Markdown to clearly label your code for each part,
- properly comment your code so that your peer who is grading your work is able to understand your code easily,
- include your comments and discussion of the difference in the mean of the mean squared errors among the different parts.



## My Implementation: Regression Model in Keras

### Imports

In [1]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler


### Download and validate dataset

In [2]:
concrete_df = pd.read_csv('https://cocl.us/concrete_data')

In [3]:
# data statistics/overview
concrete_df.describe(percentiles=[0.16, 0.5, 0.84])

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
16%,166.8,0.0,0.0,159.0,0.0,886.244,694.1,7.0,17.9104
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
84%,387.0,182.296,125.2,200.0,11.6,1056.4,856.0,90.0,53.4152
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


In [4]:
# checking for (problematic) null entries
concrete_df.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

### <a id='features'>Dataset preprocessing</a>

In [5]:
# Predictor/Target column labels
predictor_keys = [
    'Cement',
    'Blast Furnace Slag',
    'Fly Ash',
    'Water',
    'Superplasticizer',
    'Coarse Aggregate',
    'Fine Aggregate',
]

target_key = ['Strength']

In [6]:
# Predictor data -> X
predictor_df = concrete_df[predictor_keys]
X = predictor_df
X.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5


In [7]:
# Target data -> Y
target_df = concrete_df[target_key]
Y = target_df
Y.head()

Unnamed: 0,Strength
0,79.99
1,61.89
2,40.27
3,41.05
4,44.3


#### <a id='funcs'>Automation functions</a>

In [8]:
def reg_model_factory(input_shape,
                   n_hidden=1, n_nodes=10, hidden_layer_type=Dense,
                   n_outputs=1, output_layer=Dense, activations='relu',
                   optimizer='adam', loss='mean_squared_error',
                   extra_kw={}):
    """
    Generate a tf.keras.Model instance (Sequential, regression model) with
    a specified number of Dense layers and nodes
    """
    # some validation checks
    if isinstance(n_nodes, int):
        n_nodes = n_hidden*[n_nodes]
    if not isinstance(hidden_layer_type, (tuple, list)):
        hidden_layer_type = n_hidden*[hidden_layer_type]
    if not isinstance(activations, (tuple, list)):
        activations = n_hidden*[activations]
    assert len(n_nodes) >= n_hidden
    assert len(hidden_layer_type) >= n_hidden
    assert len(activations) >= n_hidden
    
    # start building model
    model = Sequential()
    # add hidden layers
    for i in range(n_hidden):
        layer = hidden_layer_type[i]
        units = n_nodes[i]
        layer_kw = dict(activation=activations[i])  
        if i == 0:
            layer_kw['input_shape'] = input_shape
        try:
            model.add(layer(units, **layer_kw))
        except TypeError:
            model.add(layer(**extra_kw))
    
    # add output layer
    output_kw = {}
    if len(activations) > n_hidden:
        output_kw['activation'] = activations[-1]
    model.add(output_layer(n_outputs, **output_kw))
    
    # compile model
    model.compile(optimizer=optimizer, loss=loss)
    
    return model

In [9]:
# generate test model
input_shape = (len(predictor_keys),)
test_model = reg_model_factory(input_shape, 1, 10)
test_model.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 10)                80        
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 11        
Total params: 91
Trainable params: 91
Non-trainable params: 0
_________________________________________________________________


In [10]:
# generate another test model
test2_model = reg_model_factory(input_shape, 3, 10)
test2_model.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_2 (Dense)              (None, 10)                80        
_________________________________________________________________
dense_3 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_4 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_5 (Dense)              (None, 1)                 11        
Total params: 311
Trainable params: 311
Non-trainable params: 0
_________________________________________________________________


<a id='reset'>Keras model resetter</a>

In [11]:
def reset_keras_model(model):
    """
    Reset/Reinitialize parameters, weights, and biases of a tf.keras.Model instance
    """
    for layer in model.layers:
        if isinstance(layer, keras.Model):
            reset_kerastf_model(layer)
            continue
        if hasattr(layer, 'cell'):
            init_container = layer.cell
        else:
            init_container = layer
        for key, initializer in init_container.__dict__.items():
            if 'initializer' not in key:
                continue
            var = getattr(layer, key.replace('_initializer', ''))
            if var is not None:
                var.assign(initializer(var.shape, var.dtype))

<a id='loop'>Fitting and evaluation looper</a>

In [12]:
def fit_eval(model, x, y, test_size=0.3, iterations=50, epochs=50, v=0):
    """
    Loop through several iterations of fitting and evaulating a tf.keras.Model instance
    """
    scores = []
    for i in range(iterations):
        reset_keras_model(model)
        x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=test_size)
        model.fit(x_train, y_train, epochs=epochs, validation_split=0.0, verbose=v)
        score = model.evaluate(x_test, y_test, verbose=v)
        scores.append(score)
    return np.asarray(scores) 

### <a id='A'>Assignment - A</a>

Let's first do a single iteration manually, to see what we get... afterwards, we will loop through everything.

First do the train/test split of the predictor data `X` and target data `Y` (for details see [here](#features)) 

In [13]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3, random_state=42)
X_train.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
196,194.7,0.0,100.5,165.6,7.5,1006.4,905.9
631,325.0,0.0,0.0,184.0,0.0,1063.0,783.0
81,318.8,212.5,0.0,155.7,14.3,852.1,880.4
526,359.0,19.0,141.0,154.0,10.9,942.0,801.0
830,162.0,190.0,148.0,179.0,19.0,838.0,741.0


Then create a NN regression model with the specified properties (for details see [here](#funcs))

In [14]:
input_shape = (len(predictor_keys),)
mdlA = reg_model_factory(input_shape, n_hidden=1, n_nodes=10)
mdlA.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_6 (Dense)              (None, 10)                80        
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 11        
Total params: 91
Trainable params: 91
Non-trainable params: 0
_________________________________________________________________


Fit the model using the training data

In [15]:
mdlA.fit(X_train, Y_train, epochs=50, validation_split=0.0, verbose=2)

Epoch 1/50
23/23 - 0s - loss: 21123.1895
Epoch 2/50
23/23 - 0s - loss: 3516.6565
Epoch 3/50
23/23 - 0s - loss: 2249.4541
Epoch 4/50
23/23 - 0s - loss: 1879.6427
Epoch 5/50
23/23 - 0s - loss: 1577.3420
Epoch 6/50
23/23 - 0s - loss: 1333.7938
Epoch 7/50
23/23 - 0s - loss: 1132.9623
Epoch 8/50
23/23 - 0s - loss: 987.0417
Epoch 9/50
23/23 - 0s - loss: 875.6826
Epoch 10/50
23/23 - 0s - loss: 788.3438
Epoch 11/50
23/23 - 0s - loss: 718.5832
Epoch 12/50
23/23 - 0s - loss: 657.4121
Epoch 13/50
23/23 - 0s - loss: 610.2436
Epoch 14/50
23/23 - 0s - loss: 576.2697
Epoch 15/50
23/23 - 0s - loss: 545.3533
Epoch 16/50
23/23 - 0s - loss: 516.2440
Epoch 17/50
23/23 - 0s - loss: 491.8597
Epoch 18/50
23/23 - 0s - loss: 472.7775
Epoch 19/50
23/23 - 0s - loss: 453.5029
Epoch 20/50
23/23 - 0s - loss: 438.4642
Epoch 21/50
23/23 - 0s - loss: 421.4355
Epoch 22/50
23/23 - 0s - loss: 405.7841
Epoch 23/50
23/23 - 0s - loss: 391.9614
Epoch 24/50
23/23 - 0s - loss: 379.6284
Epoch 25/50
23/23 - 0s - loss: 366.5983
E

<tensorflow.python.keras.callbacks.History at 0x7fb5d8137c10>

Evaluate the model using the previously specified score metric

In [16]:
score = mdlA.evaluate(X_test, Y_test, verbose=0)
score

189.46768188476562

Reset model parameters (for details see [here](#reset))...

In [17]:
reset_keras_model(mdlA)

Now, everything again 50 times with 50 epochs (for details see [here](#loop))...

In [18]:
scoresA = fit_eval(mdlA, X, Y, epochs=50, iterations=50)

Finally, we report on the mean and standard deviation (and other statistics) of all mean squared errors (MSEs):

In [19]:
pd.DataFrame(scoresA).describe(percentiles=[0.16, 0.5, 0.84])

Unnamed: 0,0
count,50.0
mean,359.625112
std,335.788911
min,160.680176
16%,185.080488
50%,263.727966
84%,416.513478
max,1629.979126


### <a id='B'>Assignment - B</a>

First, we <a id='norm'>normalize the data</a>...

In [20]:
# Normalize using the StandardScaler from scikit-learn
X_norm = StandardScaler().fit_transform(predictor_df)
X_norm = pd.DataFrame(X_norm, columns=predictor_keys)
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
0,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,0.863154,-1.21767
1,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,1.056164,-1.21767
2,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917
3,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917
4,-0.790459,0.678408,-0.847144,0.488793,-1.039143,0.070527,0.647884


In [21]:
# just checking that StandardScaler yields the desired output
((predictor_df - predictor_df.mean()) / predictor_df.std(ddof=0)).head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate
0,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,0.863154,-1.21767
1,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,1.056164,-1.21767
2,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917
3,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917
4,-0.790459,0.678408,-0.847144,0.488793,-1.039143,0.070527,0.647884


Re-compile the specified model (for details see [here](#funcs))...

In [22]:
input_shape = (len(predictor_keys),)
mdlB = reg_model_factory(input_shape, n_hidden=1, n_nodes=10)
mdlB.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_8 (Dense)              (None, 10)                80        
_________________________________________________________________
dense_9 (Dense)              (None, 1)                 11        
Total params: 91
Trainable params: 91
Non-trainable params: 0
_________________________________________________________________


Run the loop (for details see [here](#loop))...

In [23]:
scoresB = fit_eval(mdlB, X_norm, Y, epochs=50, iterations=50)

Finally, we report on the mean and standard deviation (and other statistics) of all mean squared errors (MSEs):

In [24]:
pd.DataFrame(scoresB).describe(percentiles=[0.16, 0.5, 0.84])

Unnamed: 0,0
count,50.0
mean,404.022793
std,110.455943
min,231.967438
16%,301.593983
50%,383.993896
84%,495.054626
max,679.416077


The mean of the mean squared errors seems to be slightly lower than with strategy A, which suggests normalizing is in this case a good idea. Moreover, the distribution seems to have a lower standard deviation and less outliers.

### <a id='C'>Assignment - C</a>

Re-compile the specified model (for details see [here](#funcs))...

In [25]:
input_shape = (len(predictor_keys),)
mdlC = reg_model_factory(input_shape, n_hidden=1, n_nodes=10)
mdlC.summary()

Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_10 (Dense)             (None, 10)                80        
_________________________________________________________________
dense_11 (Dense)             (None, 1)                 11        
Total params: 91
Trainable params: 91
Non-trainable params: 0
_________________________________________________________________


Use the normalized data `X_norm` again (for details see [here](#norm))

... and run the loop, but this time with 100 epochs each (for details see [here](#loop))

In [26]:
scoresC = fit_eval(mdlC, X_norm, Y, epochs=100, iterations=50)

Finally, we report on the mean and standard deviation (and other statistics) of all mean squared errors (MSEs):

In [27]:
pd.DataFrame(scoresC).describe(percentiles=[0.16, 0.5, 0.84])

Unnamed: 0,0
count,50.0
mean,184.97035
std,14.719273
min,156.594299
16%,173.597084
50%,182.163155
84%,200.422345
max,227.403625


The mean of the mean squared errors seems to have much improved after twice as many epochs in the training compared to strategy B, suggesting 50 epochs are not sufficient for training... the spread of the distribution has also become much narrower, meaning we get more consistent results.

### <a id='D'>Assignment - D</a>

This time we compile a (slightly) deeper model with 3 hidden layers (for details see [here](#funcs))

In [28]:
input_shape = (len(predictor_keys),)
mdlD = reg_model_factory(input_shape, n_hidden=3, n_nodes=10)
mdlD.summary()

Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_12 (Dense)             (None, 10)                80        
_________________________________________________________________
dense_13 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_14 (Dense)             (None, 10)                110       
_________________________________________________________________
dense_15 (Dense)             (None, 1)                 11        
Total params: 311
Trainable params: 311
Non-trainable params: 0
_________________________________________________________________


Use the normalized data `X_norm` again (for details see [here](#norm))...

... and run the loop, but this time with 50 epochs each (for details see [here](#loop))

In [29]:
scoresD = fit_eval(mdlD, X_norm, Y, epochs=50, iterations=50)

Finally, we report on the mean and standard deviation (and other statistics) of all mean squared errors (MSEs):

In [30]:
pd.DataFrame(scoresD).describe(percentiles=[0.16, 0.5, 0.84])

Unnamed: 0,0
count,50.0
mean,164.111621
std,10.154138
min,140.475433
16%,154.576243
50%,164.122971
84%,172.840353
max,187.552811


The mean of the mean squared errors seems to have much improved with a deeper neural net compared to strategy B, and is also slightly better compared to the results from strategy C. This suggests that a deeper neural network can perform even better than a shallow model with longer training epochs.