#  Classification: Backpropagation Algorithm 

<p style='text-align: justify;'> 
In this section we will study the concepts of Backpropagation algorithm works.
</p>

## Objectives

* **Understand** the concept of Backpropagation algorithm and its role in training Multilayer Perceptron Networks;
* **Learn** about the mathematical principles behind Backpropagation;
* **Explore** the different components of Backpropagation;
* **Create** a Backpropagation algorithm to train a Multilayer Perceptron Network;
* **Train** the Multilayer Perceptron Network using Backpropagation;
* **Observe** the performance of the trained Multilayer Perceptron Network Model using Backpropagation.

#
#### ⊗ The Problem: Learning in the Multilayer Perceptron

<p style='text-align: justify;'> 
When using a multi-layer perceptron without backpropagation, you will not be training the model properly. Backpropagation precisely performs the propagation of the error obtained from predictions, allowing the weights to be appropriately recalculated. Without it, the model would not be able to adapt and improve its predictions over time.
</p>  

#
#### ⊗ The Solution: Backpropagation

<p style='text-align: justify;'> 
Backpropagation is essential in MLPs for efficient error propagation, weight adjustment, and learning of non-linear relationships. It allows the network to iteratively update the weights based on gradients, resulting in improved performance and generalization to unseen data.
</p>  

#
#### ⊗ How Backpropagation works in a neural network

<p style='text-align: justify;'>
<b>Backpropagation</b> is a widely used algorithm for training <b>multi-layer perceptron</b> (MLP) neural networks. It is based on the concept of gradient descent, which aims to minimize the error between the network's predicted output and the actual target output. The backpropagation algorithm efficiently computes the gradient of the error function with respect to the network's weights, allowing for iterative adjustment of the weights to minimize the error.

The key idea behind <b>backpropagation</b> is to propagate the error from the network's output layer back to its input layer, hence the name <b>"backpropagation"</b>. This process involves two main steps: forward propagation and backward propagation.
1. <b>Forward Propagation:</b>
    During forward propagation, the input data is fed into the network, and the activations of each neuron are computed layer by layer until reaching the output layer. The activations are obtained by applying an activation function (such as sigmoid or ReLU) to the weighted sum of inputs to each neuron.
2. <b>Backward Propagation:</b>
    Once the output layer's activations are computed, the error between the predicted output and the target output is calculated using an error function (e.g., mean squared error or cross-entropy). Then, the algorithm proceeds to propagate this error back through the network to update the weights.
</p>

#
#### ⊗ Updating weights and bias in the neural network

<p style='text-align: justify;'>
In a <b>neural network</b>, the <b>weights and biases</b> are updated during the training process to optimize the network's performance. The <b>backpropagation</b> algorithm is commonly used to calculate the gradients of the weights and biases, which are then used to update their values. 

The <b>main steps</b> involved in the <b>backward propagation</b> are as follows:

1.  <b>Error Calculation:</b>
    The error is calculated by taking the derivative of the error function with respect to the output layer's activations. This derivative represents how much each output activation contributes to the overall error.
    
2.  <b>Gradient Calculation:</b>
    The derivative of the error function with respect to the weights of each neuron is computed. This is done by applying the chain rule, which allows the calculation of the derivative of a composite function. The gradient is computed for each layer, starting from the output layer and moving backward. 

3.  <b>Weight Update:</b>
    The calculated gradients are then used to update the weights of the network. The weights are adjusted in the direction that minimizes the error, based on the calculated gradients and a learning rate parameter. The learning rate determines the step size taken during weight updates. 
    
4.  <b>Repeat:</b>
    Steps 2a to 2c are repeated for multiple iterations or until a stopping criterion is met (e.g., a predefined maximum number of iterations or reaching a satisfactory level of error).

By iteratively adjusting the weights through forward and backward propagation, the <b>backpropagation</b> algorithm enables the network to learn the appropriate weights for making accurate predictions. This process of updating the weights based on error gradients is what allows the network to gradually improve its performance over time.
</p>

##### ⊗ Overfitting and underfitting 

<p style='text-align: justify;'>
Overfitting and underfitting are two common phenomena that occur during the training of machine learning models, including neural networks. They refer to the model's ability to generalize its predictions beyond the training data. Here's an explanation of each:

1. <b>Overfitting:</b>
- Overfitting occurs when a model performs extremely well on the training data but fails to generalize to new, unseen data. In other words, the model "memorizes" the training examples instead of learning the underlying patterns or relationships that can be applied to new instances. Signs of overfitting include excessively low training error but high validation or test error.
     - <b>Causes of overfitting:</b>
        - Insufficient data: When the training dataset is small, the model may find it easier to memorize the examples rather than learning meaningful patterns.
        - Model complexity: A model with high complexity, such as having too many parameters or layers, can over-adapt to the training data and capture noise or irrelevant patterns.
        - Insufficient regularization: If the regularization techniques, such as L1 or L2 regularization, are not appropriately applied, the model may not effectively prevent overfitting.
   - <b>Effects of overfitting:</b>
      - Poor generalization: An overfitted model is not able to generalize well to new, unseen data. Its performance on the training set is significantly better than on the validation or test set.
      - Increased variance: The predictions of an overfitted model can be highly sensitive to small changes in the input data, leading to unstable results.
2. <b>Underfitting:</b>
- Underfitting occurs when a model is unable to capture the underlying patterns in the training data. It fails to learn the essential relationships and exhibits poor performance even on the training set itself. Signs of underfitting include high training and validation errors.
   - <b>Causes of underfitting:</b>
     - Insufficient model complexity: If the model is too simple or lacks the necessary capacity, it may struggle to represent the complexity of the data.
     - Insufficient training time: If the model is not trained for enough epochs or iterations, it may not converge to a satisfactory solution.
     - Insufficient feature representation: If the input features do not adequately capture the relevant information, the model may struggle to learn the patterns.
   - <b>Effects of underfitting:</b>
     - High bias: An underfitted model has high bias, meaning it oversimplifies the problem and makes systematic errors, resulting in poor performance on both training and test sets.
     - Inability to learn complex relationships: An underfitted model cannot capture the complex relationships present in the data, limiting its predictive capabilities.
</p>

##### ⊗ How to avoid overfitting and underfitting during training

<b>Overfitting and underfitting</b> are common challenges in training neural networks. Here are some techniques to avoid or mitigate these issues:
   1. <b>Cross-Validation:</b>
      - Cross-validation is a powerful technique to estimate a model's performance and detect potential overfitting or underfitting. It involves splitting the available data into multiple subsets, typically a training set, validation set, and testing set. The model is trained on the training set, and its performance is evaluated on the validation set. By monitoring the model's performance on the validation set, you can assess whether the model is overfitting or underfitting. This helps in tuning hyperparameters, <b>adjusting model complexity, and ensuring good generalization</b>.
   2. <b>Regularization:</b>
      - Regularization methods are effective for both avoiding overfitting and preventing underfitting. Regularization introduces a penalty term to the loss function during training, discouraging excessive reliance on individual features or overly complex model representations. Two common regularization techniques are L1 regularization (Lasso) and L2 regularization (Ridge). L1 regularization encourages sparsity in the model by adding the absolute values of the weights to the loss function, while L2 regularization adds the squared values of the weights. <b>Both techniques help to control the magnitude of the weights and prevent overfitting, while also allowing the model to learn meaningful patterns and prevent underfitting</b>.
   3. <b>Data Augmentation:</b>
      - Data augmentation is particularly useful for avoiding overfitting in situations where the <b>available training data is limited</b>. It involves applying various transformations, such as rotation, translation, scaling, or adding noise, to the existing training examples. <b>By generating new synthetic examples, data augmentation increases the diversity and quantity of training data, reducing the risk of overfitting</b>. This technique enables the model to generalize better by exposing it to a wider range of variations and scenarios.

#
#### ⊗ Aplication Backpropagation algorithm (Código)

<p style='text-align: justify;'>
    Contruir um algoritmo Backpropagation simples utilzando os conceitos estudados.
</p>


In [4]:
import numpy as np

# Activation function (sigmoid)
def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Derivative of the activation function
def sigmoid_derivative(x):
    return x * (1 - x)

# Define the neural network class
class NeuralNetwork:
    def __init__(self, input_size, hidden_size, output_size):
        self.input_size = input_size 
        self.hidden_size = hidden_size # Hidden layer (layer between the input and output layers)
        self.output_size = output_size

        # Initialize weights with random values
        self.weights1 = np.random.randn(self.input_size, self.hidden_size)
        self.weights2 = np.random.randn(self.hidden_size, self.output_size)

        # Initialize biases with zeros
        self.bias1 = np.zeros((1, self.hidden_size))
        self.bias2 = np.zeros((1, self.output_size))

    # Forward propagation
    def forward(self, X):    
        self.hidden_layer_activation = sigmoid(np.dot(X, self.weights1) + self.bias1)
        self.output = sigmoid(np.dot(self.hidden_layer_activation, self.weights2) + self.bias2)

    # Backward propagation
    def backward(self, X, y, learning_rate):      
        error = y - self.output
        delta_output = error * sigmoid_derivative(self.output)

        error_hidden = delta_output.dot(self.weights2.T)
        delta_hidden = error_hidden * sigmoid_derivative(self.hidden_layer_activation)

        # Update weights and biases
        self.weights2 += self.hidden_layer_activation.T.dot(delta_output) * learning_rate
        self.bias2 += np.sum(delta_output, axis=0, keepdims=True) * learning_rate
        self.weights1 += X.T.dot(delta_hidden) * learning_rate
        self.bias1 += np.sum(delta_hidden, axis=0, keepdims=True) * learning_rate

    def train(self, X, y, epochs, learning_rate):
        for epoch in range(epochs):
            # Forward and backward propagation for each training example
            self.forward(X)
            self.backward(X, y, learning_rate)

    def predict(self, X):
        # Forward propagation for prediction
        self.forward(X)
        return self.output


# Example usage
X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
y = np.array([[0], [1], [1], [0]])

# Create and train the neural network
nn = NeuralNetwork(input_size=2, hidden_size=4, output_size=1)
nn.train(X, y, epochs=10000, learning_rate=0.1)

# Make predictions
predictions = nn.predict(X)
print(predictions)

[[0.00160856]
 [0.9973776 ]
 [0.9966502 ]
 [0.00406367]]


##### Discussion: What happened?

- ``xxxxxxxxxxxxxxxxxxxxxxx``
     
- ``xxxxxxxxxxxxxxxxxxxxxxxx``

#
## ☆ Challenger: Image Classification ☆

Consider the following problem:

<p style='text-align: justify;'>
    Nesse exercício, você deverá criar um MLP para classificar os números contidos na imagem a seguir, porém deverá utilizar o backpropagation como algoritmo de atualização dos pesos e bias de uma MLP.
</p>
<p style='text-align: justify;'>
Crie uma MLP com entradas, camada oculta e saída de acordo com o problema apresentado. Treine o modelo utilizando o algoritmo de backpropagation e ajuste os hiperparâmetros para obter a melhor acurácia no conjunto de validação.
</p>


<img src="./images/mnist1.png" style="width: 600px;">

### ☆ Solution ☆ 

<p style='text-align: justify;'>
    Desenvolver a solução do problema aqui.
</p>


#
## Summary

<p style='text-align: justify;'>
Escrever um resumo do que foi abordado neste notebook.
</p>

## Clear the Memory

Before moving on, please execute the following cell to clear up the CPU memory. This is required to move on to the next notebook.

In [1]:
import IPython
app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

## Next

XXXXXXXXXXXXXXXXXXXXXXXxx [_02-percepetron-training.ipynb_](03-percepetron-training.ipynb)