# Concrete data regression model with keras layers 

Concrete is considered by many to be a strong and durable material, and rightfully so. But there are different ways to assess concrete strength.

Perhaps even more importantly, these strength properties each add different qualities to concrete that make it an ideal choice in various use cases.

### Compressive strength of concrete
This is the most common and well-accepted measurement of concrete strength to assess the performance of a given concrete mixture. It measures the ability of concrete to withstand loads that will decrease the size of the concrete.

Compressive strength is tested by breaking cylindrical concrete specimens in a special machine designed to measure this type of strength. It is measured in pounds per square inch (psi). Testing is done according to the  ASTM (American Society for Testing & Materials) standard C39.

Compressive strength is important as it is the main criteria used to determine whether a given concrete mixture will meet the needs of a specific job.

### Our work

We want to constuct a  Machine Learning model, based on the ingredients used to make the concrete to predict the strength of it. In order to do this, we will use a keras Neural Network regression.

## Download and import required libraries

In [1]:
!pip install wget
import pandas as pd
import numpy as np
import keras
from keras.models import Sequential
from keras.layers import Dense
import wget
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from numpy.random import seed



Using TensorFlow backend.


## Download "Concrete" Data

We download the data from the web source 

In [2]:
print('Beginning file download with wget module')
url = 'https://cocl.us/concrete_data'
wget.download(url, 'concrete_data.csv')

Beginning file download with wget module


'concrete_data (1).csv'

## Watch out the data

We see inside the data in order to understand it and look for possible missing or invalid values.

In [3]:
concrete_data = pd.read_csv('concrete_data.csv')
concrete_data.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


We can see that the data has columns with names:
 
 * Cement	
 * Blast
 * Furnace
 * Slag	
 * Fly Ash
 * Water
 * Superplasticizer
 * Coarse Aggregate
 * Fine Aggregate
 * Age	
 * Strength

### Number of rows and columns

In [4]:
concrete_data.shape

(1030, 9)

### Some statistical analysis 

In [5]:
concrete_data.describe()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
count,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0,1030.0
mean,281.167864,73.895825,54.18835,181.567282,6.20466,972.918932,773.580485,45.662136,35.817961
std,104.506364,86.279342,63.997004,21.354219,5.973841,77.753954,80.17598,63.169912,16.705742
min,102.0,0.0,0.0,121.8,0.0,801.0,594.0,1.0,2.33
25%,192.375,0.0,0.0,164.9,0.0,932.0,730.95,7.0,23.71
50%,272.9,22.0,0.0,185.0,6.4,968.0,779.5,28.0,34.445
75%,350.0,142.95,118.3,192.0,10.2,1029.4,824.0,56.0,46.135
max,540.0,359.4,200.1,247.0,32.2,1145.0,992.6,365.0,82.6


### Null Data points

In [6]:
concrete_data.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

We see our data is clean and doesn´t have missing values

## We split data as "predictors" and "target" columns

We split the data to prepare it for using it in the keras model

In [0]:
concrete_data_columns = concrete_data.columns

predictors = concrete_data[concrete_data_columns[concrete_data_columns != 'Strength']] # all columns except Strength
target = concrete_data['Strength'] # Strength column

In [8]:
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [9]:
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

### Number of predictors 

In [0]:
n_cols = predictors.shape[1] # number of predictors

In [0]:
seed(1)
X_train, X_test, y_train, y_test = train_test_split( predictors, target, test_size=0.3, random_state=0)

## Building the model

In [0]:
# define regression model
def regression_model():
    # create model
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error',metrics=['mape'])
    return model

In [13]:
# build the model
model = regression_model()







In [14]:
history=model.fit(X_train, y_train, epochs=50, batch_size=10,verbose=0)











In [0]:
y_pred=model.predict(X_test)

In [16]:
mean_squared_error(y_test, y_pred)

131.35732400190312

We will repeat the training and prediction on the train and test data 50 times and calculate the mean squared error each time to obtain a list which we want to calculate the mean and standard deviation.

In [0]:
seed(1)
mean_squared_err=list()
for i in range(50):
  X_train, X_test, y_train, y_test = train_test_split( predictors, target, test_size=0.3)
  history=model.fit(X_train, y_train, epochs=50, batch_size=10,verbose=0)
  y_pred=model.predict(X_test)
  mean_squared_err.append(mean_squared_error(y_test, y_pred))

In [18]:
mean_squared_err

[120.42851664892494,
 115.68514232145903,
 119.68598000102334,
 106.96353674147262,
 107.14875778873785,
 115.06377369164844,
 118.32176091767609,
 176.6372488961441,
 66.5239018541125,
 76.734842475407,
 78.84514194848187,
 99.92439325115845,
 58.33001518039863,
 63.71189582510629,
 48.279096321424895,
 51.21502901254171,
 70.84656885188498,
 50.16441610185129,
 54.325413005497275,
 44.463348448800005,
 50.96525386943877,
 45.86349956614945,
 58.86715308892563,
 42.455294709904834,
 57.419441323832764,
 41.32427526681404,
 63.15556727279783,
 65.82790822642556,
 48.682639474244695,
 57.07582654255623,
 43.5722819457338,
 39.77843146032425,
 45.45254176106112,
 44.71834763801584,
 51.225739048208716,
 44.47922794162245,
 43.631293209790314,
 42.281083526486704,
 57.061823059145745,
 42.30963745221265,
 41.00970796661009,
 48.11185496972523,
 71.41487375610879,
 43.436572860753365,
 50.65891217711817,
 41.84254124061498,
 47.04821174134788,
 39.217560138196255,
 45.152734012099714,
 45.

In [19]:
np.mean(mean_squared_err)

64.06433227832582

In [20]:
np.std(mean_squared_err)

29.244131313241237

We obtained a mean of squared errors of 64.06 and a std of 29.24