<a>
  <img src='portada.png' width="1150">
</a>




# A. Build a baseline model


In [1]:
# We import libraries
import pandas as pd
import numpy as np


In [25]:
# We create the dataframe with the data obtained from the url dataset
df = pd.read_csv('concrete_data.csv')
df.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age,Strength
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28,79.99
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28,61.89
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270,40.27
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365,41.05
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360,44.3


In [3]:
# We get a general idea of ​​what the dataframe contains
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Cement              1030 non-null   float64
 1   Blast Furnace Slag  1030 non-null   float64
 2   Fly Ash             1030 non-null   float64
 3   Water               1030 non-null   float64
 4   Superplasticizer    1030 non-null   float64
 5   Coarse Aggregate    1030 non-null   float64
 6   Fine Aggregate      1030 non-null   float64
 7   Age                 1030 non-null   int64  
 8   Strength            1030 non-null   float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB


In [4]:
# # Observe the size of the dataset
df.shape

(1030, 9)

In [5]:
# # Check if the dataframe contains null values
df.isnull().sum()

Cement                0
Blast Furnace Slag    0
Fly Ash               0
Water                 0
Superplasticizer      0
Coarse Aggregate      0
Fine Aggregate        0
Age                   0
Strength              0
dtype: int64

The target variable in this problem is the concrete sample strength. Therefore, our predictors will be all the other columns.


In [6]:
df_columns = df.columns

predictors = df[df_columns[df_columns != 'Strength']] # all columns except Strength
target = df['Strength']

In [7]:
# We check that it has separated well
predictors.head()

Unnamed: 0,Cement,Blast Furnace Slag,Fly Ash,Water,Superplasticizer,Coarse Aggregate,Fine Aggregate,Age
0,540.0,0.0,0.0,162.0,2.5,1040.0,676.0,28
1,540.0,0.0,0.0,162.0,2.5,1055.0,676.0,28
2,332.5,142.5,0.0,228.0,0.0,932.0,594.0,270
3,332.5,142.5,0.0,228.0,0.0,932.0,594.0,365
4,198.6,132.4,0.0,192.0,0.0,978.4,825.5,360


In [8]:
# In the same way with the target
target.head()

0    79.99
1    61.89
2    40.27
3    41.05
4    44.30
Name: Strength, dtype: float64

We store the number of predictors in the variable n_cols. Since we will use it as input to our neural network

In [9]:
n_cols = predictors.shape[1]
n_cols

8



## Build a Neural Network 
### Let's import keras


In [10]:
# We import the Keras library. In turn the functions of Sequential and Dense
import keras
from keras.models import Sequential
from keras.layers import Dense
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error


Create a function that defines our regression model for us so that we can conveniently call it to create our model:

- One hidden layer of 10 nodes, and a ReLU activation function
- Use the adam optimizer and the mean squared error as the loss function.

In [11]:
#num of inputs = num of predictors colums
def regression_model():
    model = Sequential()
    model.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model.add(Dense(1))
    
    # compile model
    model.compile(optimizer='adam', loss='mean_squared_error')
    return model

In [12]:
#Build the model
model = regression_model()

In [13]:
#Randomly split the data into a training set (70%) and a test set (30%):  
X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

In [14]:
# Next, we will train and test the model at the same time using the *fit* method. We will train the model for 50 epochs.
reg = model.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))

We evaluate the model on the test data and calculate the root mean square error between the predicted concrete strength and the actual concrete strength. We use the mean_squared_error function from Scikit-learn.

In [16]:
#Find mean_squared_error as last value in history.
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(mse)

1405.0685347664228


Here, an empty list called mse_list is created to store the root mean square errors. Then a for loop is used to repeat steps 1 through 3 50 times. Inside the loop, the data is randomly divided into a training set and a test set, the model is trained and evaluated on the test data. Each root mean square error is stored in the mse_list. Finally, we use Numpy's np.mean function to calculate the mean of the mean squared errors and the np.std function to calculate the standard deviation.

In [15]:
mse_list = []

for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

    # Training the model
    reg = model.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))

    # Prediction on test data
    y_pred = model.predict(X_test)

    # Calculation of root mean square error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)
    print("Mean Squared Error in epoch ", i, " is: ", mse)

mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)

print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean Squared Error in epoch  0  is:  156.4518062900823
Mean Squared Error in epoch  1  is:  116.13129995967574
Mean Squared Error in epoch  2  is:  93.34491563895372
Mean Squared Error in epoch  3  is:  102.67750390213826
Mean Squared Error in epoch  4  is:  87.84534045267755
Mean Squared Error in epoch  5  is:  91.3539964377937
Mean Squared Error in epoch  6  is:  78.98367173087979
Mean Squared Error in epoch  7  is:  85.28051788341159
Mean Squared Error in epoch  8  is:  106.94380122870335
Mean Squared Error in epoch  9  is:  72.21134833573788
Mean Squared Error in epoch  10  is:  81.2030407232727
Mean Squared Error in epoch  11  is:  68.23815237778534
Mean Squared Error in epoch  12  is:  59.65312541340476
Mean Squared Error in epoch  13  is:  54.332621621669716
Mean Squared Error in epoch  14  is:  52.205648990359
Mean Squared Error in epoch  15  is:  54.802629648707
Mean Squared Error in epoch  16  is:  53.26745210405882
Mean Squared Error in epoch  17  is:  55.61610980054707
Mean

In [16]:
print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean of the mean squared errors:  62.23427011851866
Standard deviation of the mean squared errors:  21.938660691702605


---

---


# B. Normalize the data




In [17]:
# Now we need to normalize the data. We'll do this by subtracting the mean and dividing by the standard deviation.
predictors_norm = (predictors - predictors.mean()) / predictors.std()
predictors_norm.head()
n_cols = predictors_norm.shape[1]


In [18]:
def regression_model2():
    model2 = Sequential()
    model2.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model2.add(Dense(1))
    
    # compile model
    model2.compile(optimizer='adam', loss='mean_squared_error')
    return model2
model2 = regression_model2()

Train and test the model at the same time using the fit-method. We will leave out 30% of the data for validation and we will train the model for 50 epochs. And use predictors_norm instead of predictors.

In [19]:
mse_list = []

for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

    # Training the model
    reg = model2.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))

    # Prediction on test data
    y_pred = model2.predict(X_test)

    # Calculation of root mean square error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)
    print("Mean Squared Error in epoch ", i, " is: ", mse)

mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)

print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean Squared Error in epoch  0  is:  152.84085781294618
Mean Squared Error in epoch  1  is:  112.34366141084911
Mean Squared Error in epoch  2  is:  114.863909763286
Mean Squared Error in epoch  3  is:  105.18250725634856
Mean Squared Error in epoch  4  is:  117.46409185284789
Mean Squared Error in epoch  5  is:  109.06680067157387
Mean Squared Error in epoch  6  is:  108.67521821490871
Mean Squared Error in epoch  7  is:  110.49828021652212
Mean Squared Error in epoch  8  is:  117.50454587685498
Mean Squared Error in epoch  9  is:  114.07755244740737
Mean Squared Error in epoch  10  is:  107.36601869533916
Mean Squared Error in epoch  11  is:  98.66660845093422
Mean Squared Error in epoch  12  is:  130.3868067659021
Mean Squared Error in epoch  13  is:  125.7129614466891
Mean Squared Error in epoch  14  is:  104.43560939371571
Mean Squared Error in epoch  15  is:  115.03765725362716
Mean Squared Error in epoch  16  is:  118.68866554906019
Mean Squared Error in epoch  17  is:  117.1926

In [20]:
print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean of the mean squared errors:  115.55327571966389
Standard deviation of the mean squared errors:  11.357851008862163


### How does the mean of the mean squared errors compare to that from Step A?

Conclusions: <br>
We found that normalizing the data did not work too well. Both the mean and the standard deviation of the errors have been larger than in test one. Likewise, the difference has not been much and very large errors are handled.

---

---


# C. Increate the number of epochs 




This time I will repeat Part B but I will use 100 epochs for training. I will continue to use the normalized data

In [21]:
def regression_model3():
    model3 = Sequential()
    model3.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model3.add(Dense(1))
    
    # compile model
    model3.compile(optimizer='adam', loss='mean_squared_error')
    return model3
model3 = regression_model3()

In [22]:
mse_list = []

for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

    # Training the model
    reg = model3.fit(X_train, y_train, epochs=100, verbose=0, validation_data=(X_test, y_test))

    # Prediction on test data
    y_pred = model3.predict(X_test)

    # Calculation of root mean square error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)
    print("Mean Squared Error in epoch ", i, " is: ", mse)

mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)

print("")
print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean Squared Error in epoch  0  is:  160.48142988718405
Mean Squared Error in epoch  1  is:  110.27403360097689
Mean Squared Error in epoch  2  is:  101.13419954761574
Mean Squared Error in epoch  3  is:  111.98366140438955
Mean Squared Error in epoch  4  is:  99.79857672644756
Mean Squared Error in epoch  5  is:  119.69931538629947
Mean Squared Error in epoch  6  is:  117.62072280707787
Mean Squared Error in epoch  7  is:  114.03399140174018
Mean Squared Error in epoch  8  is:  107.75120577073389
Mean Squared Error in epoch  9  is:  103.49937281815818
Mean Squared Error in epoch  10  is:  110.83761200703154
Mean Squared Error in epoch  11  is:  120.78305053038625
Mean Squared Error in epoch  12  is:  104.11796481470704
Mean Squared Error in epoch  13  is:  121.77264248502483
Mean Squared Error in epoch  14  is:  111.98107577505404
Mean Squared Error in epoch  15  is:  114.78623617777478
Mean Squared Error in epoch  16  is:  110.34114338691154
Mean Squared Error in epoch  17  is:  110.

How does the mean of the mean squared errors compare to that from Step B?

**Conclusions:** <br>
Here the results have improved considerably. It can be said that the more epochs the better result. Even so, it is not a small error either. the standard deviation of the root mean square of the error has been quite low compared to previous cases.

---


---


# D. Increase the number of hidden layers



Repeat part B but use a neural network with the following instead:

- Three hidden layers, each of 10 nodes and ReLU activation function.

In [23]:
def regression_model4():
    model4 = Sequential()
    model4.add(Dense(10, activation='relu', input_shape=(n_cols,)))
    model4.add(Dense(10, activation='relu'))
    model4.add(Dense(10, activation='relu'))
    model4.add(Dense(1))
    
    # compile model
    model4.compile(optimizer='adam', loss='mean_squared_error')
    return model4
model4 = regression_model4()

In [24]:
mse_list = []

for i in range(50):
    X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.3)

    # Training the model
    reg = model4.fit(X_train, y_train, epochs=50, verbose=0, validation_data=(X_test, y_test))

    # Prediction on test data
    y_pred = model4.predict(X_test)

    # Calculation of root mean square error
    mse = mean_squared_error(y_test, y_pred)
    mse_list.append(mse)
    print("Mean Squared Error in epoch ", i, " is: ", mse)

mean_mse = np.mean(mse_list)
std_mse = np.std(mse_list)

print("")
print("Mean of the mean squared errors: ", mean_mse)
print("Standard deviation of the mean squared errors: ", std_mse)

Mean Squared Error in epoch  0  is:  108.0608692856171
Mean Squared Error in epoch  1  is:  54.704863153619
Mean Squared Error in epoch  2  is:  62.72714058941757
Mean Squared Error in epoch  3  is:  52.122600596185535
Mean Squared Error in epoch  4  is:  57.5921584766869
Mean Squared Error in epoch  5  is:  43.84366247044178
Mean Squared Error in epoch  6  is:  45.46559327578457
Mean Squared Error in epoch  7  is:  46.265972972693056
Mean Squared Error in epoch  8  is:  48.568420950880494
Mean Squared Error in epoch  9  is:  49.19031437202601
Mean Squared Error in epoch  10  is:  46.58144354189668
Mean Squared Error in epoch  11  is:  52.84597596300285
Mean Squared Error in epoch  12  is:  49.51969209332703
Mean Squared Error in epoch  13  is:  64.20209712062395
Mean Squared Error in epoch  14  is:  50.45589175083448
Mean Squared Error in epoch  15  is:  48.28216206623831
Mean Squared Error in epoch  16  is:  47.94720177216017
Mean Squared Error in epoch  17  is:  57.8692380886732
Mea

**Conclusions:** <br>
In this case the standard deviation of the root mean square of the error has has remained the same. <br>
What has improved is the mean squared error, dropping to 37. We are beginning to handle acceptable values ​​for the model.
<br>
<br>
In conclusion, test 4 has been the one that has yielded the best data. According to the study of regressive neural networks, it can be said that the most important parameter of training in a neural network has not been the treatment of the data at the beginning, nor the epochs. What has really made the difference has been adding hidden layers to give consistency to the neural network. Thank you!