<div align="center"> <h1 align="center"> REGRESSION USING KERAS </h1> </div>

### This case study consists of building a regression model using the Keras library to model data about concrete's compressive strength. The goal is to experiment with building a neural network by increasing the number of training epochs and changing number of hidden layers and to observe how changing these parameters impacts the performance of the model.


### The data we are using contains information on the compressive strength of different samples of concrete based on the volumes of the different ingredients that were used to make them.

## 1. Install necessary packages

In [18]:
!pip install numpy==1.21.4
!pip install pandas==1.3.4
!pip install keras==2.1.6
!pip install sklearn

Collecting keras==2.1.6
  Using cached Keras-2.1.6-py2.py3-none-any.whl (339 kB)
Installing collected packages: keras
  Attempting uninstall: keras
    Found existing installation: keras 2.8.0
    Uninstalling keras-2.8.0:
      Successfully uninstalled keras-2.8.0
[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
tensorflow 2.8.0 requires keras<2.9,>=2.8.0rc0, but you have keras 2.1.6 which is incompatible.[0m[31m
[0mSuccessfully installed keras-2.1.6


## 2. Import necessary modules

In [2]:
import pandas as pd
import numpy as np

## 3. Read in data and save dataframe

In [5]:
cd = pd.read_csv (r'/Users/priscilalopez-beltran/Desktop/PY4E/Keras DL model capstone/concrete_data.csv')
cd.head() # concrete data -> cd

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


## 4. Check integrity of data

**4.1. Number of data points**  
   Not a big data set (n= 1030.00), therefore we must be careful not to overfit the training data. 

In [6]:
cd.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


**4.2. Check for missing values**  
No missing values in the data

In [7]:
cd.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

## 5. Prepare Test and Train Data

**5.1. Randomly split the data into a training and test sets holding 30% of the data for testing.**

In [8]:
from sklearn.model_selection import train_test_split # import sub-library model_selection and train_test_split module

train_set, test_set = train_test_split(cd,test_size=0.30) # holding 30% of the data for testing
print(train_set.shape) # output is rows & cols
print(test_set.shape)

(721, 9)
(309, 9)


**5.2. Split data into target and predictors.**  
In order to create the model we must set the predictor and target variables. In this case study, the variable of interest (target) is the *strength* of the cement, and all other variables are predictors.

In [9]:
cd_cols = train_set.columns

# Train set
predictors_train = train_set[cd_cols[cd_cols != 'Strength']] # all columns except Strength
target_train = train_set['Strength'] # only Strength column

# Test set
predictors_test = test_set[cd_cols[cd_cols != 'Strength']] # all columns except Strength
target_test = test_set['Strength'] # Strength column

**5.3. Normalize data using z-scores: z = (x-μ)/σ**

In [26]:
predictors_norm_train = (predictors_train - predictors_train.mean()) / predictors_train.std()
predictors_norm_test = (predictors_test - predictors_test.mean()) / predictors_test.std()

target_norm_train = (target_train - target_train.mean()) / target_train.std()
target_norm_test = (target_test - target_test.mean()) / target_test.std()

In [27]:
# Save number of predictors
n_cols = predictors_norm_train.shape[1] # shape[1] cols, shape[0] rows
print(n_cols)

8


# 6. Training the model with Keras

**6.1. Define a model with one hidden layer, use the adam optimizer and the mean squared error loss function. Then, train the model using the training data for 50 epochs.**

In [21]:
# Import keras package (had to do some work-around)
!pip list | grep -i keras
!pip install keras --upgrade --log ./pip-keras.log
import keras

Keras                        2.1.6
Keras-Preprocessing          1.1.2
Collecting keras
  Using cached keras-2.8.0-py2.py3-none-any.whl (1.4 MB)
Installing collected packages: keras
  Attempting uninstall: keras
    Found existing installation: Keras 2.1.6
    Uninstalling Keras-2.1.6:
      Successfully uninstalled Keras-2.1.6
Successfully installed keras-2.8.0


In [28]:
# import necessary keras modules
from keras.models import Sequential
from keras.layers import Dense

# define regression model
def regression_model():
    # create model with one hidden layer using the add method
    model = Sequential()# constructor function
    model.add(Dense(10, activation='relu', input_shape=(n_cols,))) # 1 hidden layer + ReLU activation function

    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error') 
    return model

In [23]:
# build the model
model_1 = regression_model()

In [24]:
# train and test the model using the fit method by repeatedly iterating over the entire dataset for a 50 epochs.
model_1.fit(predictors_train,
          target_train,
          epochs=50,
          validation_split=0.3, # monitors validation loss and metrics at the end of each epoch
          verbose=0)# output visualization

<keras.callbacks.History at 0x135b2c220>

**6.2. Evaluate the model on the test data, and compute the mean squared error between the predicted concrete strength and the actual concrete strength.**  

In [29]:
eval_results = model_1.evaluate(predictors_norm_test, target_norm_test, batch_size=128)
print(eval_results)

0.8982067704200745


**6.3. Compute the mean squared error between the predicted concrete strength and the actual concrete strength. Use the mean_squared_error function from [Scikit-learn](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html?highlight=squared#sklearn.metrics.mean_squared_error)**

In [30]:
# Calculate predicted concrete strength
train_results_1 = model_1.predict(predictors_test)
print(train_results_1)

[[22.145866]
 [23.772072]
 [71.65961 ]
 [19.898537]
 [43.344032]
 [63.191597]
 [24.271797]
 [38.40439 ]
 [40.072   ]
 [46.003777]
 [27.911995]
 [36.702324]
 [17.16912 ]
 [31.925156]
 [26.601173]
 [23.315208]
 [49.858894]
 [35.798347]
 [22.828041]
 [32.54103 ]
 [23.29451 ]
 [26.753242]
 [64.09351 ]
 [54.214928]
 [35.143837]
 [42.874443]
 [28.60868 ]
 [48.568123]
 [49.97991 ]
 [24.101295]
 [43.330406]
 [52.136505]
 [41.109077]
 [41.797005]
 [31.886604]
 [46.237404]
 [40.777252]
 [56.95909 ]
 [49.36022 ]
 [61.2967  ]
 [28.519936]
 [40.02732 ]
 [32.512566]
 [30.486748]
 [41.014397]
 [37.67067 ]
 [27.39347 ]
 [40.21878 ]
 [27.848717]
 [15.294745]
 [21.42244 ]
 [32.400932]
 [38.976967]
 [23.568977]
 [54.303375]
 [28.189156]
 [28.48211 ]
 [22.736076]
 [37.7322  ]
 [27.802559]
 [37.41253 ]
 [30.860863]
 [44.026848]
 [40.218803]
 [52.501244]
 [31.077126]
 [38.99331 ]
 [18.812202]
 [13.223601]
 [44.899864]
 [26.942223]
 [33.909935]
 [25.894188]
 [26.526497]
 [36.262016]
 [35.85517 ]
 [12.362998]

In [31]:
# Compute the mean squared error 
from sklearn.metrics import mean_squared_error
mean_squared_error(target_test, train_results_1)

167.40155173360313

## 7. Train a second model with Keras  
The second model will have 3 hidden layers (same optimizer and loss function) it will be trained over 50 epochs 59 times.

In [33]:
# build second model
model_2 = regression_model()

# create placeholder for model results
model_res=[] 

# fit model with for loop for iteration
for x in range(50):
    model_2.fit(predictors_norm_train, target_norm_train, epochs=100, verbose=0)
    train_results_2 = model_2.predict(predictors_norm_test)
    model_res.append(mean_squared_error(target_norm_test,train_results_2))
print(model_res) #sanity check

[0.27680722326357426, 0.1686243155050289, 0.1520690654233241, 0.14313035606517321, 0.14044050303940783, 0.1411089508719913, 0.14130670040020646, 0.14361622876015456, 0.14150887343744392, 0.14559335641497312, 0.14292664505623662, 0.14294222842481538, 0.1411382772290765, 0.14234924560616882, 0.14251159507466438, 0.1423863994117397, 0.1409058527507011, 0.14060769116285082, 0.14254173206959184, 0.14239355253840968, 0.14142524809195517, 0.13997269956894348, 0.14011428674852866, 0.14103364470931462, 0.14153388714817824, 0.14558457753403023, 0.14007451351589625, 0.140504467372334, 0.14055956029402125, 0.14295386152210016, 0.14026165741761096, 0.14207785552190988, 0.14129773239018212, 0.1396071983942241, 0.14099585845727766, 0.14202967149122075, 0.13965151723134026, 0.13983290112666863, 0.14253556088019928, 0.14220244249862854, 0.14266246370670516, 0.1393527943329976, 0.1410337363883798, 0.14069183396752913, 0.14181299324362728, 0.1414093240060093, 0.141238583711456, 0.14542293649454766, 0.140

## 8. Report the mean and the standard deviation of the mean squared errors.

In [34]:
import statistics as st

mean = st.mean(model_res)
print(mean)

stdev = st.stdev(model_res)
print(stdev)

0.14512357641982598
0.01948360702821636


# Resources

1. IBM Course [Introduction to Deep Learning & Neural Networks with Keras](https://www.coursera.org/learn/introduction-to-deep-learning-with-keras/home/info)

2. https://keras.io/api/models/sequential/

3. https://towardsdatascience.com/how-to-fix-modulenotfounderror-and-importerror-248ce5b69b1c

4. https://www.tensorflow.org/guide/keras/train_and_evaluate