# Introduction to Deep Learning & Neural Networks with Keras
Part C: Increase the number of epochs

## Assignment Topic:
In this project, we will build a regression model using the Keras library to model the data about concrete compressive strength.

## Concrete Data:

The data can be found here: [Dataset Link](https://cocl.us/concrete_data)

The predictors in the data of concrete strength include:
1.   Cement
2.   Blast Furnace Slag
3.   Fly Ash
4.   Superplasticizer
5.   Coarse Aggregate
6.   Fine Aggregate

Download the concrete data file from the specified URL using wget

In [None]:
# Uncomment the following line if you need to download the dataset directly from the download link
# !wget "https://cocl.us/concrete_data" -O concrete_data.csv

--2023-10-03 17:39:51--  https://cocl.us/concrete_data
Resolving cocl.us (cocl.us)... 23.44.229.68, 23.44.229.71, 2600:1405:7400:21::17c2:7fa5, ...
Connecting to cocl.us (cocl.us)|23.44.229.68|:443... connected.
HTTP request sent, awaiting response... 302 Moved Temporarily
Location: https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv [following]
--2023-10-03 17:39:52--  https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv
Resolving s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)... 67.228.254.196
Connecting to s3-api.us-geo.objectstorage.softlayer.net (s3-api.us-geo.objectstorage.softlayer.net)|67.228.254.196|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 58988 (58K) [text/csv]
Saving to: ‘concrete_data.csv’


2023-10-03 17:39:52 (1.66 MB/s) - ‘concrete_data.csv’ saved [58988/58988]



Import necessary libraries and modules for data manipulation, splitting, metrics, and building a neural network model

In [None]:
import pandas as pd
import numpy as np

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense

Read the data from a CSV file named 'concrete_data' into a DataFrame (df)

In [None]:
df = pd.read_csv('concrete_data.csv')

Display the first few rows of the DataFrame to get an initial glimpse of the data

In [None]:
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3



Check the shape of the DataFrame to determine the number of rows and columns

In [None]:
df.shape

(1030, 9)

Display summary statistics of the DataFrame

In [None]:
df.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Display information about the DataFrame, including data types and non-null counts

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


Calculate the number of missing values (NaN) for each column in the DataFrame

In [None]:
df.isna().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Create a list of features by selecting all columns except 'Strength'

In [None]:
features = [col for col in df.columns if col != 'Strength']
print('features:', features)

features: ['Cement', 'Blast Furnace Slag', 'Fly Ash', 'Water', 'Superplasticizer', 'Coarse Aggregate', 'Fine Aggregate', 'Age']


Extract the 'Strength' column as the target variable and convert it to a NumPy array

In [None]:
target = df.Strength.to_numpy()

Normalize the features

In [None]:
# Import the StandardScaler class from scikit-learn's preprocessing module
from sklearn.preprocessing import StandardScaler

# Initialize the StandardScaler
scaler = StandardScaler()

# Normalize the selected columns using fit_transform
df[features] = scaler.fit_transform(df[features])

Display the first few rows of the DataFrame to get a glimpse of the data after normalization

In [None]:
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,0.863154,-1.21767,-0.279733,79.99
1,2.477915,-0.856888,-0.847144,-0.916764,-0.620448,1.056164,-1.21767,-0.279733,61.89
2,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917,3.553066,40.27
3,0.491425,0.795526,-0.847144,2.175461,-1.039143,-0.526517,-2.240917,5.057677,41.05
4,-0.790459,0.678408,-0.847144,0.488793,-1.039143,0.070527,0.647884,4.978487,44.3


## Task # 01
Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the train_test_split helper function from Scikit-learn.

In [None]:
# Split the dataset into training and testing sets with a 70-30 split ratio and a fixed random seed
x_train, x_test, y_train, y_test = train_test_split(df[features], target, test_size=0.3, random_state=0)

print(f'number of rows and number of features in xtrain: {x_train.shape}')
print(f'number of rows in ytrain: {y_train.shape}')
print(f'number of rows and number of features in x_test: {x_test.shape}')
print(f'number of rows in y_test: {y_test.shape}')

number of rows and number of features in xtrain: (721, 8)
number of rows in ytrain: (721,)
number of rows and number of features in x_test: (309, 8)
number of rows in y_test: (309,)


In [None]:
# Determine the number of columns in the feature dataset
n_cols = df[features].shape[1]
n_cols

8

## Task # 02
Train the model on the training data using 50 epochs.

### Create Regression Model

1.   One hidden layer of 10 nodes, and a ReLU activation function.
2.   Use the adam optimizer and the mean squared error as the loss function.



In [None]:
# Define a function to create a regression model
def RegressionModel():
    # Create a sequential model
    model = Sequential()
    # One hidden layer of 10 nodes, and a ReLU activation function
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    # Add the output layer with 1 node
    model.add(Dense(1))

    # Compile the model using the Adam optimizer and mean squared error as the loss function
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [None]:
# Create the regression model
model = RegressionModel()

## Main Task of Part C
Use 100 epochs this time

In [None]:
# Set the number of training epochs
n_epochs = 100

# Train the model on the training data
model.fit(x_train, y_train, epochs=n_epochs, verbose=1)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

<keras.src.callbacks.History at 0x7b85b80f4b20>

## Task # 03
Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

In [None]:
# Make predictions on the test data
y_hat = model.predict(x_test, verbose=0)

# Calculate and print the mean squared error between the predicted and actual target values
mse = mean_squared_error(y_test, y_hat)

print(f'Mean Squared Error: {mse}')

Mean Squared Error: 156.1422615908695


## Task # 04
Repeat steps 1-3, 50 times, i.e., create a list of 50 mean squared errors.

In [None]:
# Create an empty list to store mean squared errors
mse_list = []

# Set the number of runs
n_runs = 50

# Repeat the following steps 50 times, varying the random_state for data splitting
for i in range(n_runs):
    # Split the data into training and test sets (70% train, 30% test) with different random_state values
    x_train, x_test, y_train, y_test = train_test_split(df[features], target, test_size=0.3, random_state=i)

    # Create the regression model
    model = RegressionModel()

    # Train the model on the training data using n_epochs=100 epochs
    model.fit(x_train, y_train, epochs=n_epochs, verbose=0)

    # Make predictions on the test data
    y_hat = model.predict(x_test, verbose=0)

    # Evaluate the model on the test data and calculate mean squared error
    mean_square_error = mean_squared_error(y_test, y_hat)
    print(f'Mean Squared Error in Run {i + 1}: {mean_square_error}')

    # Append the calculated mean squared error to the list
    mse_list.append(mean_square_error)

Mean Squared Error in Run 1: 142.02257852905788
Mean Squared Error in Run 2: 177.075450003509
Mean Squared Error in Run 3: 170.7316726195524
Mean Squared Error in Run 4: 166.5516376283654
Mean Squared Error in Run 5: 194.92092022817022
Mean Squared Error in Run 6: 151.9904520330484
Mean Squared Error in Run 7: 197.03914839111167
Mean Squared Error in Run 8: 147.17028478507868
Mean Squared Error in Run 9: 159.26424266045717
Mean Squared Error in Run 10: 147.88943193249074
Mean Squared Error in Run 11: 146.8501517766702
Mean Squared Error in Run 12: 188.50327481951547
Mean Squared Error in Run 13: 161.31076705746725
Mean Squared Error in Run 14: 163.2615957730332
Mean Squared Error in Run 15: 196.0018563556164
Mean Squared Error in Run 16: 153.71617533923367
Mean Squared Error in Run 17: 146.17174523479625
Mean Squared Error in Run 18: 168.56139043476003
Mean Squared Error in Run 19: 131.5499288064481
Mean Squared Error in Run 20: 184.1480418924096
Mean Squared Error in Run 21: 158.80193

## Task # 05
Report the mean and the standard deviation of the mean squared errors.

In [None]:
# Convert the list of mean squared errors into a NumPy array
mse_list = np.array(mse_list)

# Calculate the mean and standard deviation of the mean squared errors from all runs
mean = np.mean(mse_list)
standard_deviation = np.std(mse_list)

# Report by printing the mean and standard deviation of the mean squared errors
print(f'Mean of {n_runs} runs: {mean}')
print(f'Standard Deviation of {n_runs} runs: {standard_deviation}')

Mean of 50 runs: 164.66501112174083
Standard Deviation of 50 runs: 16.62362282939442


## How does the mean of the mean sqaured errors compare to that from Step B?
**Mean of MSE**: The third approach (100 epochs) has the lowest mean MSE (164.665), indicating that, on average, it provides the best predictive performance among the three approaches.

**Standard Deviation of MSE**: The standard deviation of MSE for the third approach (16.624) is relatively low, suggesting that the model's performance is consistent and has less variability across different runs.

*Please take into consideration that when running these notebooks, the results may display variability owing to several factors. These factors may be random initialization, data partitioning, model training dynamics, variations in the versions of libraries such as Keras and TensorFlow, and/or additional external factors.*