# Keras Regression Model - Concrete Stress Data

## Download and Clean Dataset

In [2]:
#!pip install numpy==1.21.4
#!pip install pandas==1.3.4
#!pip install keras==2.1.6

In [3]:
import pandas as pd
import numpy as np

Predictors of concrete strength are: 

1. Cement

2. Blast Furnace Slag

3. Fly Ash

4. Water

5. Superplasticizer

6. Coarse Aggregate

7. Fine Aggregate

In [4]:
data_path = "https://cocl.us/concrete_data"
concrete_data = pd.read_csv(data_path)
concrete_data.head()



Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [5]:
concrete_data.shape

(1030, 9)

In [7]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


Check dataset for missing values.

In [8]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

### Split data into train/test. 

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.

In [10]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

Check predictors and target (strength) dataframes:

In [11]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [12]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

Normalize data - subtract mean and divide by standard deviation.

In [13]:
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,0.862735,-1.217079,-0.279597
1,2.476712,-0.856472,-0.846733,-0.916319,-0.620147,1.055651,-1.217079,-0.279597
2,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,3.55134
3,0.491187,0.79514,-0.846733,2.174405,-1.038638,-0.526262,-2.239829,5.055221
4,-0.790075,0.678079,-0.846733,0.488555,-1.038638,0.070492,0.647569,4.976069


Save number of columns for use in building network.

In [14]:
n_cols = predictors_norm.shape[1] # number of predictors

## Import Keras


Import Keras and supporting packages.

In [16]:
import keras

In [17]:
from keras.models import Sequential
from keras.layers import Dense

## Build Neural Network

Function defines regression model with one hidden layer of 10 nodes, and a ReLU activation function.

In [18]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

## Train and Test Network

In [21]:
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

In [24]:
mse_list = []  # Create an empty list to store the mean squared errors

Train and test model at same time. Leave out 30% of data for validation, and train model for 50 epochs.

In [25]:
# Repeat the process 50 times
for i in range(50):
    # Randomly split the data into training and testing sets
    X_train, X_test, y_train, y_test = train_test_split(predictors_norm, target, test_size=0.3)

    # Train the model on the training data using 50 epochs
    model = regression_model()
    model.fit(X_train, y_train, epochs=50, verbose=0)

    # Evaluate the model on the test data and compute the mean squared error
    y_pred = model.predict(X_test)
    mse = mean_squared_error(y_test, y_pred)

    # Add the mean squared error to the list
    mse_list.append(mse)
    print(f"Iteration {i+1}: Mean squared error = {mse}")

print("All mean squared errors:", mse_list)

Iteration 1: Mean squared error = 540.7146201968236
Iteration 2: Mean squared error = 602.3075599575016
Iteration 3: Mean squared error = 350.8947175670311
Iteration 4: Mean squared error = 286.4385458781763
Iteration 5: Mean squared error = 323.75695206880084
Iteration 6: Mean squared error = 334.53342431971055
Iteration 7: Mean squared error = 283.22514417428323
Iteration 8: Mean squared error = 442.889688766401
Iteration 9: Mean squared error = 324.9148768165848
Iteration 10: Mean squared error = 419.0122000619782
Iteration 11: Mean squared error = 397.7259573528864
Iteration 12: Mean squared error = 399.3833615949957
Iteration 13: Mean squared error = 403.97065827114136
Iteration 14: Mean squared error = 394.4338396589823
Iteration 15: Mean squared error = 347.0995622277851
Iteration 16: Mean squared error = 281.95172444174614
Iteration 17: Mean squared error = 483.37684558798657
Iteration 18: Mean squared error = 311.286588454801
Iteration 19: Mean squared error = 306.309552594982

Evaluate the model on the test data and compute the mean squared error between the predicted concrete strength and the actual concrete strength.

## Compute mean and the standard deviation of the mean squared errors.

In [26]:
import statistics

# Calculate the mean and standard deviation of the mean squared errors
mse_mean = statistics.mean(mse_list)
mse_stddev = statistics.stdev(mse_list)

# Report the mean and standard deviation of the mean squared errors
print(f"Mean of the mean squared errors: {mse_mean}")
print(f"Standard deviation of the mean squared errors: {mse_stddev}")

Mean of the mean squared errors: 363.89513912911553
Standard deviation of the mean squared errors: 76.42505331357307
