# Neural network basics

Welcome to the `04_neural_network_basics` notebook. This part of the portfolio is designed to introduce fundamental concepts and techniques in PyTorch, with a particular emphasis on building and understanding neural networks.

Throughout this notebook, I'll explore essential topics such as setting up the environment, defining neural network architectures, and implementing both forward and backward propagation. I'll also cover training procedures, model evaluation, and techniques to save and load trained models.

By working through these exercises, the aim is to gain practical experience in constructing and optimizing neural networks, forming a solid foundation for more advanced machine learning endeavors.

## Table of contents

1. [Introduction](#introduction)
2. [Understanding neural networks](#understanding-neural-networks)
3. [Setting up the environment](#setting-up-the-environment)
4. [Building a neural network](#building-a-neural-network)
5. [Forward propagation](#forward-propagation)
6. [Loss function](#loss-function)
7. [Backpropagation](#backpropagation)
8. [Training the neural network](#training-the-neural-network)
9. [Evaluating the model](#evaluating-the-model)
10. [Saving and loading the model](#saving-and-loading-the-model)
11. [Optimizations](#optimizations)
12. [Handling real-world data](#handling-real-world-data)
13. [Conclusion](#conclusion)
14. [Further exercises](#further-exercises)

## Understanding neural networks

Neural networks are a foundational technique in machine learning and artificial intelligence, utilized for identifying patterns and relationships within data. Unlike traditional algorithms, which often rely on predefined rules, neural networks learn directly from examples, making them highly adaptable and powerful. These models can handle a variety of tasks, from classification to regression, by adjusting their parameters through training, thereby improving their performance with more data and experience.

### Key concepts

#### 1. Neurons and layers
Neural networks are inspired by the structure and function of the human brain, consisting of interconnected units called neurons. These neurons are organized into layers: the input layer, hidden layers, and the output layer.

- **Input layer**: The first layer in the network that receives the input data. Each neuron in this layer represents a feature of the input data.
- **Hidden layers**: Layers between the input and output layers. These layers perform various transformations on the inputs received, allowing the network to learn complex patterns. There can be multiple hidden layers, which is why deep neural networks are also known as deep learning models.
- **Output layer**: The final layer that produces the output of the network. The number of neurons in this layer corresponds to the number of desired outputs.

#### 2. Activation functions
Activation functions introduce non-linearity into the network, enabling it to learn and model complex relationships in the data. Without activation functions, the network would only be able to model linear relationships.

- **ReLU (Rectified Linear Unit)**: Outputs the input directly if it is positive; otherwise, it outputs zero. It is widely used due to its simplicity and effectiveness.
- **Sigmoid**: Compresses the input to a range between 0 and 1. It is often used in binary classification problems.
- **Tanh (Hyperbolic tangent)**: Compresses the input to a range between -1 and 1, centering the data. It is often used in practice but can lead to issues with gradient vanishing.

#### 3. Forward propagation
Forward propagation is the process of passing input data through the network to obtain an output. During this process, each neuron computes a weighted sum of its inputs, adds a bias term, and applies an activation function to produce its output. The output of one layer becomes the input to the next layer, and this process continues until the final output layer.

#### 4. Loss function
The loss function measures the difference between the predicted outputs and the actual targets. It quantifies how well the neural network is performing. The goal of training the network is to minimize this loss.

- **Mean Squared Error (MSE)**: Commonly used for regression tasks, it calculates the average squared difference between predicted and actual values.
- **Cross-entropy loss**: Used for classification tasks, it measures the difference between the predicted probability distribution and the actual distribution.

#### 5. Backpropagation
Backpropagation is the process of updating the network's weights to minimize the loss. It involves calculating the gradient of the loss function with respect to each weight and adjusting the weights in the opposite direction of the gradient. This ensures that the loss decreases with each iteration.

#### 6. Optimizers
Optimizers are algorithms that adjust the weights of the network to minimize the loss. They use the gradients calculated during backpropagation to update the weights.

- **Stochastic Gradient Descent (SGD)**: Updates the weights using a small, randomly chosen subset of the data (a mini-batch). This makes the optimization process faster and allows the model to learn from a diverse set of examples.
- **Adam (Adaptive Moment Estimation)**: An extension of SGD that adapts the learning rate for each parameter, making it more efficient and robust in practice.

#### 7. Training and validation
Training a neural network involves iteratively feeding data through the network, calculating the loss, and updating the weights. This process is repeated for a specified number of epochs (complete passes through the training dataset).

- **Training set**: The subset of data used to train the model.
- **Validation set**: A separate subset of data used to evaluate the model's performance during training, helping to tune hyperparameters and prevent overfitting.

#### 8. Overfitting and underfitting
- **Overfitting**: Occurs when the model learns the training data too well, capturing noise and specific patterns that do not generalize to new data. This results in poor performance on the validation set.
- **Underfitting**: Occurs when the model is too simple to capture the underlying patterns in the data, leading to poor performance on both the training and validation sets.

#### 9. Regularization techniques
Regularization techniques are used to prevent overfitting by adding constraints or penalties to the model.

- **Dropout**: Randomly drops a fraction of neurons during training, forcing the network to learn redundant representations and improving generalization.
- **L2 regularization (Ridge)**: Adds a penalty proportional to the sum of the squares of the weights, discouraging large weights and promoting simpler models.

#### 10. Model evaluation
After training, the model's performance is evaluated on a test set that was not seen during training. This provides an unbiased estimate of how well the model generalizes to new data.

- **Accuracy**: The proportion of correctly classified instances out of the total instances.
- **Precision, recall, and F1 score**: Metrics that provide deeper insights into the model's performance, especially for imbalanced datasets.

### Maths

#### 1. Structure of Neural Networks

##### Layers
- **Input layer**: This is the first layer of the network, which receives the raw input data. Each neuron in this layer corresponds to one feature of the input data.
- **Hidden layers**: These layers are located between the input and output layers. They perform computations and extract features from the input data. A neural network can have multiple hidden layers, which allows it to learn complex patterns.
- **Output layer**: This is the final layer of the network, which produces the output. The number of neurons in this layer depends on the type of problem (e.g., one neuron for binary classification, multiple neurons for multi-class classification).

##### Neurons
Each neuron in a neural network computes a weighted sum of its inputs, adds a bias term, and applies an activation function to produce its output.

#### 2. Forward propagation

Forward propagation is the process by which input data passes through the network to generate an output. It involves the following steps:

##### Weighted sum
For a given neuron $ j $ in layer $ l $, the input $ z_j^{(l)} $ is computed as:
$$ z_j^{(l)} = \sum_{i=1}^{n} w_{ij}^{(l-1)} a_i^{(l-1)} + b_j^{(l)} $$
where:
- $ w_{ij}^{(l-1)} $ is the weight connecting neuron $ i $ in layer $ l-1 $ to neuron $ j $ in layer $ l $.
- $ a_i^{(l-1)} $ is the activation of neuron $ i $ in layer $ l-1 $.
- $ b_j^{(l)} $ is the bias term for neuron $ j $ in layer $ l $.
- $ n $ is the number of neurons in layer $ l-1 $.

##### Activation function
The output $ a_j^{(l)} $ of neuron $ j $ in layer $ l $ is obtained by applying an activation function $ f $ to the weighted sum:
$$ a_j^{(l)} = f(z_j^{(l)}) $$

Common activation functions include:
- **Sigmoid**: $ f(z) = \frac{1}{1 + e^{-z}} $
- **Tanh**: $ f(z) = \tanh(z) $
- **ReLU (Rectified Linear Unit)**: $ f(z) = \max(0, z) $

#### 3. Loss function

The loss function quantifies the difference between the predicted output and the actual target. The goal of training a neural network is to minimize this loss. Common loss functions include:

- **Mean Squared Error (MSE)**: Used for regression tasks, defined as:
  $$ \text{MSE} = \frac{1}{m} \sum_{i=1}^{m} (y_i - \hat{y_i})^2 $$
  where $ y_i $ is the true value and $ \hat{y_i} $ is the predicted value.

- **Cross-entropy loss**: Used for classification tasks, defined as:
  $$ \text{Cross-Entropy} = -\frac{1}{m} \sum_{i=1}^{m} \sum_{j=1}^{k} y_{ij} \log(\hat{y_{ij}}) $$
  where $ y_{ij} $ is the binary indicator (0 or 1) if class label $ j $ is the correct classification for input $ i $, and $ \hat{y_{ij}} $ is the predicted probability of $ i $ being in class $ j $.

#### 4. Backpropagation

Backpropagation is the process of adjusting the network's weights to minimize the loss. It involves calculating the gradient of the loss function with respect to each weight and updating the weights using gradient descent.

##### Gradient descent
The weight update rule for gradient descent is:
$$ w_{ij} = w_{ij} - \eta \frac{\partial L}{\partial w_{ij}} $$
where:
- $ \eta $ is the learning rate, a hyperparameter that controls the step size of the update.
- $ \frac{\partial L}{\partial w_{ij}} $ is the partial derivative of the loss function with respect to the weight $ w_{ij} $.

##### Calculating gradients
The gradients are computed using the chain rule of calculus. For a given weight $ w_{ij} $, the gradient is calculated as:
$$ \frac{\partial L}{\partial w_{ij}} = \frac{\partial L}{\partial a_j} \cdot \frac{\partial a_j}{\partial z_j} \cdot \frac{\partial z_j}{\partial w_{ij}} $$

The partial derivatives are:
- $ \frac{\partial L}{\partial a_j} $: The derivative of the loss with respect to the neuron's activation, which depends on the loss function.
- $ \frac{\partial a_j}{\partial z_j} $: The derivative of the activation function.
- $ \frac{\partial z_j}{\partial w_{ij}} $: The derivative of the weighted sum with respect to the weight, which is the input to the neuron.

#### 5. Training the network

Training a neural network involves the following steps:
1. **Initialize weights**: Set the initial values of the weights, typically using small random values.
2. **Forward propagation**: Compute the outputs of the network for a batch of input data.
3. **Compute loss**: Calculate the loss using the predicted outputs and the true targets.
4. **Backpropagation**: Compute the gradients of the loss with respect to the weights.
5. **Update weights**: Adjust the weights using gradient descent.
6. **Repeat**: Iterate over the training data for a specified number of epochs until the loss converges.

#### 6. Regularization techniques

To prevent overfitting, various regularization techniques can be applied:

- **L2 regularization (Ridge)**: Adds a penalty proportional to the sum of the squares of the weights to the loss function.
  $$ L_{\text{ridge}} = L + \lambda \sum_{j} w_j^2 $$
  where $ \lambda $ is the regularization parameter.

- **L1 regularization (Lasso)**: Adds a penalty proportional to the sum of the absolute values of the weights to the loss function.
  $$ L_{\text{lasso}} = L + \lambda \sum_{j} |w_j| $$

- **Dropout**: Randomly sets a fraction of the neurons to zero during training, which helps prevent the network from becoming too reliant on any single neuron and improves generalization.

## Setting up the environment

## Building a neural network

## Forward propagation

## Loss function

## Backpropagation

## Training the neural network

## Evaluating the model

## Saving and loading the model

## Optimizations

## Handling real-world data

## Conclusion

## Further exercises