<a href="https://colab.research.google.com/github/evanch98/predict-concrete-strength-keras/blob/main/Final_Project_Build_a_Deep_Learning_Model.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Final Project: Build a Deep Learning Model

Date: 4/11/2024

## Import the Needed packages

In [1]:
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

## About the Dataset

The dataset is about the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them. Ingredients include:

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

## Download and Clean the Dataset

In [2]:
concrete_data = pd.read_csv("https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DL0101EN/labs/data/concrete_data.csv")
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


Let's check how many data points we have by using  the `.shape` property.

In [3]:
concrete_data.shape

(1030, 9)

It looks like 1,030 data points.

Now, let's check the dataset for any missing value.

In [4]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

Good! There is no missing value in the dataset.

## Split the Data into Features and Target

In [5]:
concrete_data_columns = concrete_data.columns

X = concrete_data[concrete_data_columns[concrete_data_columns != "Strength"]]
y = concrete_data["Strength"]

Let's quickly check the shapes of `X` and `y`.

In [6]:
X.shape

(1030, 8)

In [7]:
y.shape

(1030,)

## Build a Regression Model in Keras

This Regression Model includes 8 inputs, 1 hidden layer with 10 nodes, and 1 output. The activation function is `ReLu` activation function. The optimzer is `adam`. and the loss function is Mean Squared Error.

In [8]:
def regression_model():
  # create a model
  model = Sequential()
  model.add(Dense(10, activation="relu", input_shape=(X.shape[1],)))
  model.add(Dense(1))

  # compile the model
  model.compile(optimizer="adam", loss="mean_squared_error")
  return model

In [9]:
# build  the model
model = regression_model()

### Part A

#### Train and Test the Model

In [10]:
mse = []
for i in range(50):
  X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
  model.fit(X_train, y_train, epochs=50, verbose=0)
  predicts = model.predict(X_test)
  mse.append(mean_squared_error(y_test, predicts))

print(f"Mean of the Mean Squared Error: {np.mean(mse)}")
print(f"Standard deviation of the Mean Squared Error: {np.std(mse)}")

Mean of the Mean Squared Error: 62.023841522935655
Standard deviation of the Mean Squared Error: 41.344918367640034


### Part B

Let's normalize the features data.

In [12]:
X_norm = (X - X.mean()) / X.std()
X_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


#### Train and Test the Normalized Model with Normalized Data

In [13]:
mse = []
for i in range(50):
  X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3)
  model.fit(X_train, y_train, epochs=50, verbose=0)
  predicts = model.predict(X_test)
  mse.append(mean_squared_error(y_test, predicts))

print(f"Mean of the Mean Squared Error: {np.mean(mse)}")
print(f"Standard deviation of the Mean Squared Error: {np.std(mse)}")

Mean of the Mean Squared Error: 43.903917831649714
Standard deviation of the Mean Squared Error: 24.602467920303006


The mean of the mean squared errors in Part B decreases compared to that of Part A.

### Part C

#### Train and Test the Model with 100 `epochs`

In [14]:
mse = []
for i in range(50):
  X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3)
  model.fit(X_train, y_train, epochs=100, verbose=0)
  predicts = model.predict(X_test)
  mse.append(mean_squared_error(y_test, predicts))

print(f"Mean of the Mean Squared Error: {np.mean(mse)}")
print(f"Standard deviation of the Mean Squared Error: {np.std(mse)}")

Mean of the Mean Squared Error: 34.21603413616552
Standard deviation of the Mean Squared Error: 2.3667481752674298


The mean of the mean squared errors in Part C decreases compared to that of Part B.

### Part D

Let's redefine the regression model.

In [15]:
def regression_model_2():
  # create a model
  model = Sequential()
  model.add(Dense(10, activation="relu", input_shape=(X.shape[1],)))
  model.add(Dense(10, activation="relu"))
  model.add(Dense(10, activation="relu"))
  model.add(Dense(1))

  # compile the model
  model.compile(optimizer="adam", loss="mean_squared_error")
  return model

#### Train and Test the new regression model

In [16]:
model_2 = regression_model_2()

In [17]:
mse = []
for i in range(50):
  X_train, X_test, y_train, y_test = train_test_split(X_norm, y, test_size=0.3)
  model_2.fit(X_train, y_train, epochs=50, verbose=0)
  predicts = model_2.predict(X_test)
  mse.append(mean_squared_error(y_test, predicts))

print(f"Mean of the Mean Squared Error: {np.mean(mse)}")
print(f"Standard deviation of the Mean Squared Error: {np.std(mse)}")

Mean of the Mean Squared Error: 27.800787393278203
Standard deviation of the Mean Squared Error: 15.531723597333466


The mean of the mean squared errors in Part D decreases compared to that of Part B.
