# Assignment: Compresive Strength Concrete Problem


### Abstract: 

Concrete is the most important material in civil engineering. The concrete compressive strength (concrete strength to bear the load) is a highly nonlinear function of age and ingredients.  <br><br>

<table border="1"  cellpadding="6" bordercolor="red">
	<tbody>
        <tr>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Data Set Characteristics:&nbsp;&nbsp;</b></p></td>
		<td><p class="normal">Multivariate</p></td>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Number of Instances:</b></p></td>
		<td><p class="normal">1030</p></td>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Area:</b></p></td>
		<td><p class="normal">Physical</p></td>
        </tr>
     </tbody>
    </table>
<table border="1" cellpadding="6">
    <tbody>
        <tr>
            <td bgcolor="#DDEEFF"><p class="normal"><b>Attribute Characteristics:</b></p></td>
            <td><p class="normal">Real</p></td>
            <td bgcolor="#DDEEFF"><p class="normal"><b>Number of Attributes:</b></p></td>
            <td><p class="normal">9</p></td>
            <td bgcolor="#DDEEFF"><p class="normal"><b>Date Donated</b></p></td>
            <td><p class="normal">2007-08-03</p></td>
        </tr>
     </tbody>
    </table>
<table border="1" cellpadding="6">	
    <tbody>
    <tr>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Associated Tasks:</b></p></td>
		<td><p class="normal">Regression</p></td>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Missing Values?</b></p></td>
		<td><p class="normal">N/A</p></td>
		<td bgcolor="#DDEEFF"><p class="normal"><b>Number of Web Hits:</b></p></td>
		<td><p class="normal">231464</p></td>
	</tr>
    </tbody>
    </table>

###  Description:
| Features Name | Data Type | Measurement | Description |
| -- | -- | -- | -- |
Cement (component 1) | quantitative | kg in a m3 mixture | Input Variable
Blast Furnace Slag (component 2) | quantitative | kg in a m3 mixture | Input Variable
Fly Ash (component 3) | quantitative | kg in a m3 mixture | Input Variable
Water (component 4) | quantitative | kg in a m3 mixture | Input Variable
Superplasticizer (component 5) | quantitative | kg in a m3 mixture | Input Variable
Coarse Aggregate (component 6) | quantitative | kg in a m3 mixture | Input Variable
Fine Aggregate (component 7) | quantitative | kg in a m3 mixture | Input Variable
Age | quantitative | Day (1~365) | Input Variable
Concrete compressive strength | quantitative | MPa | Output Variable

### WORKFLOW :
- Load Data
- Check Missing Values ( If Exist ; Fill each record with mean of its feature )
- Standardized the Input Variables. **Hint**: Centeralized the data
- Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).
- Model : input Layer (No. of features ), 3 hidden layers including 10,8,6 unit & Output Layer with activation function relu/tanh (check by experiment).
- Compilation Step (Note : Its a Regression problem , select loss , metrics according to it)
- Train the Model with Epochs (100) and validate it
- If the model gets overfit tune your model by changing the units , No. of layers , activation function , epochs , add dropout layer or add Regularizer according to the need .
- Evaluation Step
- Prediction


# Load Data:
[Click Here to Download DataSet](https://github.com/ramsha275/ML_Datasets/blob/main/compresive_strength_concrete.csv)

# Initialise & Imports

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

DATADIR = "concrete\compresive_strength_concrete.csv"

In [2]:
df = pd.read_csv(DATADIR)
df

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.30
...,...,...,...,...,...,...,...,...,...
1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28,44.28
1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28,31.18
1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28,23.70
1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28,32.77


In [3]:
df.isnull().sum()

Cement (component 1)(kg in a m^3 mixture)                0
Blast Furnace Slag (component 2)(kg in a m^3 mixture)    0
Fly Ash (component 3)(kg in a m^3 mixture)               0
Water  (component 4)(kg in a m^3 mixture)                0
Superplasticizer (component 5)(kg in a m^3 mixture)      0
Coarse Aggregate  (component 6)(kg in a m^3 mixture)     0
Fine Aggregate (component 7)(kg in a m^3 mixture)        0
Age (day)                                                0
Concrete compressive strength(MPa, megapascals)          0
dtype: int64

In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column                                                 Non-Null Count  Dtype  
---  ------                                                 --------------  -----  
 0   Cement (component 1)(kg in a m^3 mixture)              1030 non-null   float64
 1   Blast Furnace Slag (component 2)(kg in a m^3 mixture)  1030 non-null   float64
 2   Fly Ash (component 3)(kg in a m^3 mixture)             1030 non-null   float64
 3   Water  (component 4)(kg in a m^3 mixture)              1030 non-null   float64
 4   Superplasticizer (component 5)(kg in a m^3 mixture)    1030 non-null   float64
 5   Coarse Aggregate  (component 6)(kg in a m^3 mixture)   1030 non-null   float64
 6   Fine Aggregate (component 7)(kg in a m^3 mixture)      1030 non-null   float64
 7   Age (day)                                              1030 non-null   int64  
 8   Concrete compressive strength(MPa, megapascals)  

# Standardized Data

In [5]:
df['Age (day)'] = df['Age (day)'].astype('float64')

In [6]:
df

Unnamed: 0,Cement (component 1)(kg in a m^3 mixture),Blast Furnace Slag (component 2)(kg in a m^3 mixture),Fly Ash (component 3)(kg in a m^3 mixture),Water (component 4)(kg in a m^3 mixture),Superplasticizer (component 5)(kg in a m^3 mixture),Coarse Aggregate (component 6)(kg in a m^3 mixture),Fine Aggregate (component 7)(kg in a m^3 mixture),Age (day),"Concrete compressive strength(MPa, megapascals)"
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28.0,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28.0,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270.0,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365.0,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360.0,44.30
...,...,...,...,...,...,...,...,...,...
1025,276.4,116.0,90.3,179.6,8.9,870.1,768.3,28.0,44.28
1026,322.2,0.0,115.6,196.0,10.4,817.9,813.4,28.0,31.18
1027,148.5,139.4,108.6,192.7,6.1,892.4,780.0,28.0,23.70
1028,159.1,186.7,0.0,175.6,11.3,989.6,788.9,28.0,32.77


In [7]:
#Shuffle
df = df.sample(frac=1)

In [8]:
x = df.iloc[:,:8]
y = df.iloc[:,8]

scaler = StandardScaler()

x = scaler.fit_transform(x)
#x = (x-x.mean()) / x.std()

# Split

In [11]:
#Split
#Split into 80% Training(Samples,Labels) , 20% Test(Samples,Labels)
train_ratio = 0.80
test_ratio = 0.20

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=1-train_ratio, random_state=42)

print('Train: {} - Test: {}'.format(len(x_train), len(x_test)))

Train: 824 - Test: 206


In [141]:
#Split
#Split into 50% Training(Samples,Labels) , 30% Test(Samples,Labels) and 20% Validation Data(Samples,Labels).
#train_ratio = 0.50
#validation_ratio = 0.20
#test_ratio = 0.30

#x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=1-train_ratio, random_state=42)
#x_val, x_test, y_val, y_test = train_test_split(x_test, y_test, test_size=test_ratio/(test_ratio + validation_ratio), random_state=42)

#print('Train: {} - Val: {} - Test: {}'.format(len(x_train) , len(x_val) , len(x_test)))

Train: 515 - Val: 206 - Test: 309


# Create Model 

In [24]:
from tensorflow.keras import models
from tensorflow.keras import layers

def build_model():
    Network = models.Sequential()
    Network.add(layers.Dense(300, activation='relu', input_shape=(x_train.shape[1],) ))
    #Network.add(layers.Dense(300, activation='relu' ))
    Network.add(layers.Dense(300, activation='relu' ))
    Network.add(layers.Dense(1))

    # Compilation
    Network.compile(optimizer='rmsprop', loss='mse', metrics=['mse','mae','mape'])

    return Network

k = 10
num_val_samples = len(x_train) // k
num_epochs = 100
all_scores = []

for i in range(k):
    print('processing fold #', i)
    x_val = x_train[i * num_val_samples: (i + 1) * num_val_samples]
    y_val = y_train[i * num_val_samples: (i + 1) * num_val_samples]

    partial_x_train = np.concatenate([x_train[:i * num_val_samples], x_train[(i + 1) * num_val_samples:]], axis=0)
    partial_y_train = np.concatenate([y_train[:i * num_val_samples], y_train[(i + 1) * num_val_samples:]], axis=0)

    Network = build_model()
    history = Network.fit(partial_x_train, partial_y_train, epochs=num_epochs, batch_size=1, verbose=0)

    val_loss, val_mse, val_mae, val_mape = Network.evaluate(x_val, y_val, verbose=0)
    print("Mean Squared Error: {} --- Mean Absolute Error: {} -- Mean Absolute Percentage Error: {}".format(val_mse, val_mae, val_mape))
    
    all_scores.append(val_mae)

print("Mean MAE: ", np.mean(all_scores ))

processing fold # 0
Mean Squared Error: 37.26762390136719 --- Mean Absolute Error: 4.139495849609375 -- Mean Absolute Percentage Error: 13.67328929901123
processing fold # 1
Mean Squared Error: 33.18057632446289 --- Mean Absolute Error: 4.121759414672852 -- Mean Absolute Percentage Error: 13.763701438903809
processing fold # 2
Mean Squared Error: 27.560359954833984 --- Mean Absolute Error: 3.6608729362487793 -- Mean Absolute Percentage Error: 15.286680221557617
processing fold # 3
Mean Squared Error: 29.730754852294922 --- Mean Absolute Error: 3.924819231033325 -- Mean Absolute Percentage Error: 13.446793556213379
processing fold # 4
Mean Squared Error: 21.259845733642578 --- Mean Absolute Error: 3.3595190048217773 -- Mean Absolute Percentage Error: 11.158926010131836
processing fold # 5
Mean Squared Error: 18.671323776245117 --- Mean Absolute Error: 3.3027477264404297 -- Mean Absolute Percentage Error: 10.306880950927734
processing fold # 6
Mean Squared Error: 35.74169158935547 --- Me

In [25]:
np.mean(all_scores)

3.7926102

# Evaluate Model

In [33]:
#Check Model performance of Train Data
train_loss,train_mse,train_mae,train_mape = Network.evaluate(x_train, y_train)
print("Mean Square Error: ",train_mse,"\nMean Absolute Error:",train_mae,"\nMean Absolute Percentage Error:",train_mape)


#Check Model performance of TEST Data
test_loss,test_mse,test_mae,test_mape = Network.evaluate(x_test, y_test)
print("Mean Square Error: ",test_mse,"\nMean Absolute Error:",test_mae,"\nMean Absolute Percentage Error:",test_mape)

Mean Square Error:  17.635988 
Mean Absolute Error: 2.8205974 
Mean Absolute Percentage Error: 9.4031725
Mean Square Error:  23.981066 
Mean Absolute Error: 3.6114628 
Mean Absolute Percentage Error: 12.071371


# Prediction

In [59]:
predictions = Network.predict(x_test)
predictions

array([[-3.43806386e-01],
       [-1.87725440e-01],
       [-6.94800258e-01],
       [-1.67139456e-01],
       [ 1.29820123e-01],
       [-1.34914136e+00],
       [ 7.19047308e-01],
       [ 1.45461559e+00],
       [ 3.01989585e-01],
       [ 3.05399299e-01],
       [ 5.11862278e-01],
       [ 9.38045681e-01],
       [-1.22588694e+00],
       [-5.73768318e-01],
       [-1.43497562e+00],
       [-1.59071982e+00],
       [ 9.87488806e-01],
       [ 4.99727935e-01],
       [-1.33388877e+00],
       [ 2.35184655e-01],
       [ 1.44781470e+00],
       [ 3.35473204e+00],
       [ 2.58048820e+00],
       [-1.13802171e+00],
       [ 2.12715149e-01],
       [-1.07375979e+00],
       [-1.31938839e+00],
       [-5.90231895e-01],
       [-9.52220917e-01],
       [-6.13759279e-01],
       [ 6.42789304e-01],
       [-1.41423345e+00],
       [-8.74705434e-01],
       [ 2.02311325e+00],
       [-2.72853345e-01],
       [-1.34102714e+00],
       [-6.35600537e-02],
       [-1.05563387e-01],
       [ 6.4