# Project Instructions

## Task A. Build a baseline model (5 marks) 

Use the Keras library to build a neural network with the following:

- One hidden layer of 10 nodes, and a ReLU activation function

- Use the adam optimizer and the mean squared error as the loss function.

1. Randomly split the data into a training and test sets by holding 30% of the data for testing. You can use the 
train_test_split
helper function from Scikit-learn.

2. Train the model on the training data using 50 epochs.

3. Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength. You can use the mean_squared_error function from Scikit-learn.

4. Repeat steps 1 - 3, 50 times, i.e., create a list of 50 mean squared errors.

5. Report the mean and the standard deviation of the mean squared errors.

Submit your Jupyter Notebook with your code and comments.

## Initial Project Setup

1. Install all necessary python packages.

In [1]:
%pip install numpy
%pip install pandas
%pip install keras
%pip install scikit-learn
%pip install tensorflow


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.3.1[0m[39;49m -> [0m[32;49m24.0[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpython3.11 -m pip install --upgrade pip[0m
Note: you may need to restart the kernel to use updated packages.

[1m[

2. Import `pandas`, `numpy`, and `keras`, into project.

In [2]:
import pandas as pd
import numpy as np
import keras

3. Import "Sequential" and "Dense" from `keras` library.

In [3]:
from keras.models import Sequential
from keras.layers import Dense

## Setup Data Frame
We will be using the concrete data set csv. It will be assigned to the `concrete_data` variable.

In [4]:
concrete_data = pd.read_csv('https://cocl.us/concrete_data')


Sample data set can be returned as a table with the `head` method.

In [5]:
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


The `describe` method can give some good statistics and insights into the distribution of the data. Count, mean, min, max, ect.

In [6]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


## Create Model with Neural Network.

A function called `regression_model` is created that sets up a Sequential model that specifies 10 neurons in the layer. It uses the `ReLU` activation function. The model is then compiled with the `Adam` optimization algorithm with `MSE` loss function.

In [7]:
def regression_model():
    model = Sequential()
    model.add(Dense(10, activation='relu'))
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

The function is instantiated and assigned to the variable `model`.

In [8]:
model = regression_model()

## Create Predictors and Target

Create the predictors and target. First we need to get all of the data columns from the concrete data set.

In [9]:
concrete_data_columns = concrete_data.columns
print(concrete_data_columns.values)

['Cement' 'Blast Furnace Slag' 'Fly Ash' 'Water' 'Superplasticizer'
 'Coarse Aggregate' 'Fine Aggregate' 'Age' 'Strength']


### Predictors
The predictors will be all the columns minus the column 'Strength'.

In [10]:

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']]
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


### Target
The target will be the column we want to predict. In this case the "Strengh" column is what we want to predict, so it will be assigned to the `target` variable.

In [11]:
target = concrete_data['Strength']
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

### Train Model
The model will be trained with the normalized predictors `predictors` and the `target`. 30% of the data will be used for validation and 70% will be used for training. It will do 50 epochs. 

In [12]:
model.fit(predictors, target, validation_split=0.3, epochs=50, verbose=2)

Epoch 1/50
23/23 - 0s - loss: 31009.3516 - val_loss: 25295.7598 - 194ms/epoch - 8ms/step
Epoch 2/50
23/23 - 0s - loss: 22996.2441 - val_loss: 18571.9766 - 23ms/epoch - 1ms/step
Epoch 3/50
23/23 - 0s - loss: 17060.8477 - val_loss: 13692.4932 - 24ms/epoch - 1ms/step
Epoch 4/50
23/23 - 0s - loss: 12779.2969 - val_loss: 10214.2236 - 23ms/epoch - 1ms/step
Epoch 5/50
23/23 - 0s - loss: 9698.0537 - val_loss: 7693.5557 - 22ms/epoch - 971us/step
Epoch 6/50
23/23 - 0s - loss: 7454.1909 - val_loss: 5866.1797 - 22ms/epoch - 940us/step
Epoch 7/50
23/23 - 0s - loss: 5815.1616 - val_loss: 4521.8823 - 22ms/epoch - 950us/step
Epoch 8/50
23/23 - 0s - loss: 4615.5913 - val_loss: 3548.7568 - 22ms/epoch - 964us/step
Epoch 9/50
23/23 - 0s - loss: 3749.3704 - val_loss: 2851.4424 - 22ms/epoch - 947us/step
Epoch 10/50
23/23 - 0s - loss: 3123.0859 - val_loss: 2357.5493 - 21ms/epoch - 924us/step
Epoch 11/50
23/23 - 0s - loss: 2679.3640 - val_loss: 2015.8817 - 21ms/epoch - 926us/step
Epoch 12/50
23/23 - 0s - loss

<keras.src.callbacks.History at 0x2a54e6ed0>

### Find the Mean and Standard Deviation
Need to import `sklearn.model_selection` and `sklearn.metrics`

In [13]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

Create a variable for loop iterations and a variable to hold the MSE values generated from the loop.

In [14]:
num_iterations = 50
mse_values = []

Run the for look with 50 iterations. Use test data from each iteration to create a prediction. We will need to find the mean value of `y_pred` so it can be passed to the `mean_squared_error` function along with the test data. The MSE value will be appended to the `mse_values`.

In [15]:
for _ in range(num_iterations):
    x_train, x_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3, random_state=42)
    y_pred = model.predict(x_test)
    y_pred_aggregate = np.mean(y_pred, axis=1)
    mse_value = mean_squared_error(y_test, y_pred_aggregate)
    mse_values.append(mse_value)
    

 1/10 [==>...........................] - ETA: 0s



Calculate the "mean" and "standard deviation".

In [16]:
mean_mse = np.mean(mse_values)
std_mse = np.std(mse_values)
print(f"Mean squared error: {mean_mse}")
print(f"Standard deviation of MSE: {std_mse}")

Mean squared error: 800.6198574964072
Standard deviation of MSE: 1.1368683772161603e-13


And we are done!