# Building ANN using Tensorflow 2 for Regression
### (Building an ANN Regression model to predict electrical energy output of a combined cycle power plant)
- Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods
- The dataset contains 9568 data points collected from a **Combined Cycle Power Plant** over 6 years (2006-2011), when the power plant<br> 
  was set to work with full load. Features consist **of hourly average ambient variables Temperature (T)**, **Ambient Pressure (AP)**,<br>
  **Relative Humidity (RH)** and **Exhaust Vacuum (V)** to predict the **net hourly electrical energy output (EP)  of the plant**.<br>
  The data source is [UCI Machine Learning Repository](https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant).<br>
  [Udemy-Regression for this dataset](https://www.udemy.com/course/linear-regression-with-artificial-neural-network/learn/lecture/18889080#overview)

# Part 0 -  libraries

### &nbsp; 1. Importing the libraries

In [50]:
import numpy as np
# import matplotlib.pyplot as plt
import pandas as pd
import tensorflow as tf

# Part 1 - Data Preprocessing

### &nbsp; 1. Importing the dataset

In [51]:
# Importing the dataset. xlsx file can be read with excel
dataset = pd.read_excel('Folds5x2_pp.xlsx')

# when we are about to train a model, we always have to create tw0 separate subsets

# 1. Creating the Metrics of features i.e all the column containing the features
X = dataset.iloc[:, :-1].values     # iloc (index location), to select the value of the following indexes
                                    # : means all the rows/columns
                                    # :-1 means the except the last column/row
                                  
# 2. Creating the Dependent variables vector i.e. column/vector containing the dependent variable we want to predict.
Y = dataset.iloc[:, -1].values      # -1 means the last column, in order word -1 represent the index of the last column

In [52]:
print(X)

[[  14.96   41.76 1024.07   73.17]
 [  25.18   62.96 1020.04   59.08]
 [   5.11   39.4  1012.16   92.14]
 ...
 [  31.32   74.33 1012.92   36.48]
 [  24.48   69.45 1013.86   62.39]
 [  21.6    62.52 1017.23   67.87]]


In [53]:
print(Y)

[463.26 444.37 488.56 ... 429.57 435.74 453.28]


### &nbsp; &nbsp; 2. Encoding categorical data
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  **‼️ We don't have categorical variable ‼️**

#### &nbsp; &nbsp; &nbsp; &nbsp; ❌2.1 Label Encoding the "Gender" column

#### &nbsp; &nbsp; &nbsp; &nbsp; ❌2.2 One Hot Encoding the "name_of_coutries" column


### &nbsp; &nbsp; 3. Splitting the dataset into the Training set and Test set
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - We want to train our ANN on a separate set called **Training set** <br> 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - And evaluate it performance on a separate set called **Training set** <br> 

In [54]:
# Model_selection in scikit learn contain the train_test_split function, that allow us to split our dataset into training and test sets.
from sklearn .model_selection import train_test_split

# Create 4 variable to collect what will be return by the train_test_split_function
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.2, random_state = 0)
# 0.2 means 20% of the dataset will be used for testing
# 0.2 means 80% of the dataset will be used for training
# random_state means that the split will be the same each time we run the program
# test_size means that the test set will be 20% of the dataset

### &nbsp; &nbsp; 4. Feature Scaling 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; **Note**: Feature Scaling is absolute compulsory in Deep Learning. <br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i.e. We're **normalizing** or **standardizing** the data to have a mean of zero and a standard deviation of one. <br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - And will be applied to all our feature variables irrespective of whether they are already in the desire scale/range <br>


In [55]:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()

# Applying feature scaling feature of all the training and test set i.e. on only the feature.
X_train = sc.fit_transform(X_train)         # fit_transform is fitted to the train set in order to avoid information leakage.
X_test = sc.transform(X_test)

# Part 2 - Building the ANN
- Structure of the ANN we're going to build
  
  ![alt text](image.png)
  - We have 4 layers:
    - Input Layer in yellow
    - First Hidden Layer containing 6 neurons in green    ✅
    - Second Hidden Layer containing 6 neurons in green   ✅
    - Output Layer containing 1 neuron in red
  - We can change the architecture if we want i.e. instead of 6 neurons in each layer we can have any  number of neurons.
  
  #### &nbsp; &nbsp; &nbsp; &nbsp; Determine the number of neurons in the hidden layers.

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; The is no rule of thumb to determine the number of neurons, based on experiment.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Start with a number of hidden neurons between the **number of input features (n)** and **2–3× n**, then **tune** based on **validation performance**<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Too few neurons → underfitting.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Too many neurons → overfitting, slower training, more compute.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Use dropout or L2 regularization if you go large.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Always monitor validation loss to adjust the architecture.<br>
<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - n is the number of input features and assuming we're doing classification, regression or fully connecting (dense) layers.

| Situation                        | Suggested Hidden Neurons                    |
| -------------------------------- | ------------------------------------------- |
| **Simple problem**               | $\text{hidden\_neurons} \approx n \ or \ \frac{n}{2}$|
| **Moderate complexity** ✅       | $\text{hidden\_neurons}\approx 1.5n \ \text{or} \approx 2n$ |
| **High complexity / large data** | $\text{hidden\_neurons} \approx 3n$ or more |

$$\text{hidden\_neurons}\approx 1.5n = 1.5*4 = 1.5*4 = 6 ✅$$


### &nbsp; &nbsp; 1. Initializing the ANN
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Here we'll initialize the ANN as a sequence of layers

In [56]:
# Creating an object of the Sequential class 
   # Sequential class belong to the models module of the keras library
        # And the keras library belong to the tensorflow module
    
ann = tf.keras.models.Sequential()

### &nbsp; &nbsp; 2. Adding the input layer and the first hidden layer

In [57]:
# with the add() method we can add layers to the                   
ann.add(tf.keras.layers.Dense(units = 6 , activation = 'relu'))    # Dense stand for the connection between the input layer and 1st hidden layer.
                                                                        # Meaning each neuron in the input should be full connected to each neuron in the hidden layer.
                                                                            # The Dense class is taken from the layers module of the keras library.
                                                                            # The layer module contains tools to add layers in our ANN.
                                                                   # units corresponds to the number of neurons we want to have in the 1st hidden layer. 
                                                                   # The number of neurons in the input layer is equal to the number of features variables
                                                                        # We don't have to specify the number of neurons in the input layer because Tensorflow will do it automatically for us.
                                                    # The activation function is the Rectifier activation function i.e. the RELU activation function
                                                    # The RELU activation function is one of the most popular activation functions.
# My  unit = number of input features * 1.5 = 4 * 1.5 = 6
# 6 is called the number of hidden layers or hyperparameter value.

### &nbsp; &nbsp; 3. Adding the second hidden layer
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Here we will add a 2nd layer in order to build a deep learning model as suppose to a shallow modell.

In [58]:
# We just copy the above code and past it here in order to add a second hidden layer
ann.add(tf.keras.layers.Dense(units = 6 , activation = 'relu'))        # units corresponds to the number of neurons we want to have in the 2st hidden layer...  
                                                                       # ... which will be connected automatically to the previous.


### &nbsp; &nbsp; 4. Adding the output layer
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Here we will add the output layer which will contain what we want to predict.

In [59]:
# We just copy the above code and past it here in order to add a second hidden layer
ann.add(tf.keras.layers.Dense(units = 1))       # units corresponds to the number of neurons we want to have at the output layer...  
                                                    # ... which will be connected automatically to the previous.
                                                # For the activation function we can use sigmoid or softmax or no activation function....
                                                    #... but the sigmoid(for only 2 categories to predict in the end i.e. 0 or 1) and ...
                                                    #... softmax(for more tha 2 categories to predict in the end) are used for classification problems and here we're...
                                                    #... dealing with regression(i.e. when we want to predict continuous number as output) problems, therefore we should use no activation function.       
                                                                       

# Part 3 - Training the ANN

### &nbsp; &nbsp; 1. Compiling the ANN with an optimizer, a loss function and metrics
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - **Optimizer**: is the algorithm with which we'll perform Stochastic Gradient Descent <br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; i.e. updating the weights through network using backpropagation in order to reduce the loss<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Loss function is the function that we'll use to measure the performance of our model.<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Metrics is the metric that we'll use to evaluate the performance of our model.<br>

In [60]:
# Using the compile method of the Sequential class to compile the model
ann.compile(optimizer = 'adam', loss = 'mean_squared_error')   # the optimizer to perform SGD is the adam optimizer, in order to reduce the loss
                                # The loss function to choose for regression problem is the mean_squared_error/rmse(root mean squared error) loss function

### &nbsp; &nbsp; 2. Training the ANN on the Training set over certain number of epochs

In [61]:
# Using the fit() method to train our model
ann.fit(X_train, Y_train, batch_size = 32, epochs = 100 )   # Here we've to specify on which set we want to fit our ann (i.e. ANN) i.e. X_train and Y_train
                                                            # Here we've to specify the batch_size and the number of epochs
                                                                # over each epochs/rounds the loss is slightly reduced.
                                                                # epochs = 100 is the default value
                                                            # batch_size is the number of feature which we want to forwardpropagate at a time
                                                                # batch_size = 32 is the default value 

Epoch 1/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 2ms/step - loss: 204200.3906
Epoch 2/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 2ms/step - loss: 190275.5156
Epoch 3/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m1s[0m 3ms/step - loss: 148726.7656
Epoch 4/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 1ms/step - loss: 87974.5859 
Epoch 5/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 41592.6992
Epoch 6/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 23195.7773
Epoch 7/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 17376.7480
Epoch 8/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 13861.2676
Epoch 9/100
[1m240/240[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 2ms/step - loss: 10859.5537
Epoch 10/100
[1m240/240[0m [32m━━━━━━━━

<keras.src.callbacks.history.History at 0x298bf5182d0>

- We can see that the loss get reduce over time

# Part 4 - Making the predictions and evaluating the 
- Since our Neural brain is already smart we can use it to predict the result of new observation
- Now we can start with **inference**, which consist of predicting the result of new observation

### &nbsp; &nbsp; 1. Predict the result of a single observation
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;  - We use our ANN model to predict the result of a single observation<br>
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 

### &nbsp; &nbsp; 

### &nbsp; &nbsp; 2. Predicting the Test set results

In [62]:
# Using the predict() method to predict the result of the test set
    # Y_pred wil store all the prediction of the test set
Y_pred = ann.predict(X_test)

[1m60/60[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step


- Displaying next to each other the predicted values and the real values

In [63]:
# Comparing the predicted and real result
np.set_printoptions(precision = 2)                                  # setting the result to 2 decimal places

# Since the dependent variable as we can see in part 2 is horizontal we have to reshape it vertically, so that we can easily compare the predicted and real result...
# ... then concatenate the 2 vectors next to each other and then print the result
print(np.concatenate((Y_pred.reshape(len(Y_pred), 1), Y_test.reshape(len(Y_test), 1)), 1))       # reshape is used to reshape the vector vertically

[[430.76 431.23]
 [459.1  460.01]
 [463.77 461.14]
 ...
 [470.31 473.26]
 [442.41 438.  ]
 [462.5  463.28]]


#### The prediction is each time very close to our real result, which Awesome, meaning that our ANN performed very well

### &nbsp; &nbsp; ❌3. Making the Confusion Matrix   <== ‼️ (not use in regression, only for classification)‼️
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - Get the final accuracy on the test set.