<div align="center"> <h1 align="center"> REGRESSION USING KERAS </h1> </div>

### This case study consists of building a regression model using the Keras library to model data about concrete's compressive strength. The goal is to experiment with building a neural network by increasing the number of training epochs and changing number of hidden layers and to observe how changing these parameters impacts the performance of the model.


### The data we are using contains information on the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

<div align="center"> <h1 align="center"> PART A </h1> </div>

## 1. Install necessary packages

In [7]:
!pip install numpy==1.21.4
!pip install pandas==1.3.4
!pip install keras==2.1.6
!pip install sklearn

Collecting keras==2.1.6
  Using cached Keras-2.1.6-py2.py3-none-any.whl (339 kB)
Installing collected packages: keras
  Attempting uninstall: keras
    Found existing installation: keras 2.8.0
    Uninstalling keras-2.8.0:
      Successfully uninstalled keras-2.8.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.8.0 requires keras<2.9,>=2.8.0rc0, but you have keras 2.1.6 which is incompatible.[0m[31m
[0mSuccessfully installed keras-2.1.6


## 2. Import necessary libraries

In [8]:
import pandas as pd
import numpy as np

## 3. Read in data and save dataframe

In [9]:
cd = pd.read_csv (r'/Users/priscilalopez-beltran/Desktop/PY4E/Keras DL model capstone/concrete_data.csv')
cd.head() # concrete data -> cd

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


## 4. Check integrity of data

**4.1. Number of data points**  
   Not a big data set (n= 1030.00), therefore we must be careful not to overfit the training data. 

In [10]:
cd.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


**4.2. Check for missing values**  
No missing values in the data

In [11]:
cd.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## 5. Prepare Test and Train Data

**5.1. Randomly split the data into a training and test sets by holding 30% of the data for testing.**

In [12]:
from sklearn.model_selection import train_test_split # import sub-library model_selection and train_test_split module

train_set, test_set = train_test_split(cd,test_size=0.30) # holding 30% of the data for testing
print(train_set.shape) # output is rows & cols
print(test_set.shape)

(721, 9)
(309, 9)


**5.2. Evaluate the model on the test data (a), and compute the mean squared error between the predicted concrete strength and the actual concrete strength (b).**  
In order to create the model we must set the predictor and target variables. In this case study, the variable of interest (target) is the *strength* of the cement, and all other variables are predictors.

In [13]:
cd_cols = train_set.columns

# Train set
predictors_train = train_set[cd_cols[cd_cols != 'Strength']] # all columns except Strength
target_train = train_set['Strength'] # only Strength column

# Test set
predictors_test = test_set[cd_cols[cd_cols != 'Strength']] # all columns except Strength
target_test = test_set['Strength'] # Strength column

In [14]:
# Save number of predictors
n_cols = predictors_train.shape[1] # shape[1] cols, shape[0] rows
print(n_cols)

8


# 6. Training the model

**6.1. Define the model and train it using the training data for 50 epochs.**

In [28]:
# Import keras
!pip list | grep -i keras
!pip install keras --upgrade --log ./pip-keras.log
import keras

keras                        2.8.0
Keras-Preprocessing          1.1.2


In [29]:
from keras.models import Sequential
from keras.layers import Dense

# define regression model
def regression_model():
    # create model with one hidden layer using the add method
    model = Sequential()# constructor function
    model.add(Dense(10, activation='relu', input_shape=(n_cols,))) # 1 hidden layer + ReLU activation function
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [30]:
# build the model
model_1 = regression_model()

In [31]:
# train and test the model using the fit method by repeatedly iterating over the entire dataset for a 50 epochs.
model_1.fit(predictors_train,
          target_train,
          epochs=50,
          validation_split=0.3, # monitors validation loss and metrics at the end of each epoch
          verbose=0)# output visualization

<keras.callbacks.History at 0x1599dd330>

**6.2. Evaluate the model on the test data.**

In [32]:
eval_results = model_1.evaluate(predictors_test, target_test, batch_size=128)
print(eval_results)

408.984130859375


**6.3. Compute the mean squared error between the predicted concrete strength and the actual concrete strength. Use the mean_squared_error function from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html?highlight=squared#sklearn.metrics.mean_squared_error)**

In [43]:
# Calculate predicted concrete strength
train_results_1 = model_1.predict(predictors_test)
print(train_results_1)

[[ 1.55064726e+01]
 [ 1.26802530e+01]
 [ 1.87120380e+01]
 [ 5.49722137e+01]
 [ 1.86434536e+01]
 [ 2.31540508e+01]
 [ 3.98688011e+01]
 [ 1.85894947e+01]
 [ 6.80987320e+01]
 [ 1.24111118e+01]
 [ 1.00801802e+01]
 [ 6.55459290e+01]
 [ 1.26170969e+01]
 [ 9.16747131e+01]
 [ 8.23725033e+00]
 [ 3.29463043e+01]
 [ 7.22047348e+01]
 [ 9.88663769e+00]
 [ 2.65036983e+01]
 [ 2.13974438e+01]
 [ 1.66301823e+01]
 [ 6.80987320e+01]
 [ 3.19917660e+01]
 [ 2.19986515e+01]
 [ 1.71444988e+01]
 [ 3.77102242e+01]
 [ 8.58236542e+01]
 [ 8.00694942e+00]
 [ 3.37845535e+01]
 [ 1.12865171e+01]
 [ 1.65908756e+01]
 [ 3.75638733e+01]
 [ 2.51944637e+01]
 [ 3.51445808e+01]
 [ 3.76053391e+01]
 [ 9.61383533e+00]
 [ 1.11468115e+01]
 [ 6.66894770e+00]
 [ 5.28744354e+01]
 [ 6.39473963e+00]
 [ 1.80044003e+01]
 [ 8.96633148e+01]
 [ 5.52479630e+01]
 [ 6.09844055e+01]
 [ 1.02914438e+01]
 [ 3.94494438e+01]
 [ 1.00940619e+01]
 [ 8.32888336e+01]
 [ 2.26504002e+01]
 [ 6.18776941e+00]
 [ 1.37498312e+01]
 [ 3.35650940e+01]
 [ 4.2248378

In [44]:
# Compute the mean squared error 
from sklearn.metrics import mean_squared_error
mean_squared_error(target_test, train_results_1)

408.9841551639459

## 7. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.



In [46]:
# build second model
model_2 = regression_model()

# create placeholder for model results
model_res=[] 

# fit model with for loop for iteration
for x in range(50):
    model_2.fit(predictors_train, target_train, epochs=50, verbose=0)
    train_results_2 = model_2.predict(predictors_test)
    model_res.append(mean_squared_error(target_test,train_results_2))
print(model_res)

[160.76386886901307, 108.38954551641314, 79.39520210882435, 68.39246092882793, 67.57831608286483, 71.58965331977733, 78.07409942293408, 63.3765400196226, 63.41840904490186, 61.02146271880915, 64.95648293101362, 57.68202426079558, 59.84376514485404, 57.08872951591511, 52.19014389488179, 45.28447863811512, 41.513367177362575, 39.86965409598209, 41.61301344664871, 39.70288292151564, 42.777167302027266, 42.07680363862309, 46.174661321913966, 41.26423087978021, 42.5218043052612, 40.16569005626375, 46.13387703256152, 40.574155657503155, 42.69262430155855, 40.83342217550628, 41.87027012390321, 40.94976433889229, 42.94061089689753, 41.933083260697444, 47.089614540838, 44.48537430018752, 42.33422195507974, 46.98529703041056, 40.516416488720196, 40.77252813192613, 41.3740844433249, 44.52657110402142, 49.99804804802403, 40.50152874741349, 43.96724349856839, 41.393170460926775, 40.686666189431214, 40.684603242067176, 42.91974878256074, 51.50379871860091]


## 8. Report the mean and the standard deviation of the mean squared errors.

In [48]:
import statistics as st

mean = st.mean(model_res)
print(mean)

stdev = st.stdev(model_res)
print(stdev)

52.28782362065127
20.773659609192705


Mean of the list of mean square errors is 52.28782362065127  
Standard deviation of the list of mean square errors is 20.773659609192705