# 02. Neural Networks

In this section, we will learn about **neural networks** and focus more on their architecture and training process.

In the next notebook, [03. Neural Networks in PyTorch](03_neural_networks_in_pytorch.ipynb), we will implement a neural network using PyTorch.

Let's say you want to predict if you will pass an exam `(1)` or not `(0)`. You have the following data:

- Hours studied (x1)
- How smart you are (x2)
- Previous knowledge (x3)
- Name (x4)

This means that our neural network will take four inputs: `x1`, `x2`, `x3`, and `x4`. The output will be a single value, which is either `0` or `1` (pass or fail).

As you can probably guess, not all of these features are useful for predicting the outcome. For example, your name is not a good predictor of whether you will pass the exam or not.

Let's walk through the steps of how the neural network will work.

> Note: I will include videos and articles that explain the concepts in more detail in the last section of this notebook.


## Neural Network Structure

A neural network is made up of layers of neurons. Each neuron takes inputs, applies a transformation, and produces an output. The output of one layer becomes the input to the next layer.

To take our previous example, we can represent the neural network structure as follows:

<img src="../09_images/01-neural_network_structure.png" alt="Neural Network Structure" width="800">

The neural network has:

- **Input Layer**: The first layer that takes the inputs `x1`, `x2`, `x3`, and `x4`.
- **Hidden Layer**: The layer that processes the inputs and applies transformations. In this case, we have one neuron in the hidden layer, but we could have more neurons and more layers for a more complex model.
- **Output Layer**: The final layer that produces the output. In this case, we have one output neuron that gives the final prediction.

Now let's investigate how the neural network works step by step.


### Step 1: Initializating Weights and Biases

In a neural network, we have **[weights](https://www.geeksforgeeks.org/deep-learning/the-role-of-weights-and-bias-in-neural-networks/)** and **[biases](https://www.turing.com/kb/necessity-of-bias-in-neural-networks)**. Weights are the parameters that the model learns during training, and biases are added to the weighted sum of inputs to help the model fit the data better.

The weights and biases are initialized randomly at the beginning of the training process. For our example, we will have four weights (one for each input) and one bias.

The weights measure the importance of each input feature, while the bias allows the model to shift the output up or down. As mentioned earlier, not all features are useful for predicting the outcome, so the weights will adjust accordingly during training.

For example, if the weight for `x4` (name) is close to zero, it means that the name is not a useful feature for predicting the outcome. The model will learn to ignore it.

### Step 2: Forward Pass

In the forward pass, the inputs are multiplied by their corresponding weights, and the bias is added to the weighted sum. This is done for each neuron in the hidden layer.

The output of the hidden layer is then passed through an activation function, which introduces non-linearity into the model. This allows the neural network to learn complex patterns in the data (this will be explained in more detail in a later section).

<img src="../09_images/01-weight_initialization.png" alt="Initializing Weights and Biases" width="800">


### Step 3: Calculating Loss

After the forward pass, we need to calculate the loss, which measures how well the model's predictions match the actual labels. The loss function quantifies the difference between the predicted output and the true output.

### Step 4: Backward Pass

In the backward pass, we calculate the gradients of the loss with respect to the weights and biases. This is done using **backpropagation**, which is a method for calculating the gradients efficiently.

The gradients tell us how much the loss will change if we adjust the weights and biases. We use these gradients to update the weights and biases in the direction that reduces the loss.

### Step 5: Updating Weights and Biases

After calculating the gradients, we update the weights and biases using an optimization algorithm. The weights and biases are adjusted in the direction that minimizes the loss.
This process is repeated for multiple iterations (epochs) until the model converges and the loss reaches an acceptable level.

To summarize, the steps of a neural network are:

1. Initialize weights and biases randomly.
2. Perform a forward pass to calculate the output.
3. Calculate the loss.
4. Perform a backward pass to calculate gradients.
5. Update weights and biases using the gradients.
6. Repeat steps 2-5 for multiple epochs until convergence.

<img src="../09_images/01-neural_network_complete.png" alt="Neural Network Steps" width="1000">


## Extra Reading

For a good overview of neural networks, I would highly recommend going through those 4 videos:

1. [But what is a Neural Network?](https://www.youtube.com/watch?v=aircAruvnKk)
2. [Gradient Descent, How Neural Networks Learn](https://www.youtube.com/watch?v=IHZwWFHWa-w&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi&index=2U)
3. [Backpropagation, Intuitively](https://www.youtube.com/watch?v=Ilg3gGewQ5U)
4. [Backpropagation, Calculus](https://www.youtube.com/watch?v=tIeHLnjs5U8)

For loss functions, I would recommend reading the following articles:

- [Loss Functions Explained](https://medium.com/deep-learning-demystified/loss-functions-explained-3098e8ff2b27)
- [PyTorch Loss Functions: The Ultimate Guide](https://neptune.ai/blog/pytorch-loss-functions)
- [PyTorch Loss Functions](https://www.digitalocean.com/community/tutorials/pytorch-loss-functions)

For gradient descent, I would recommend reading the following articles:

- [Gradient Descent Explained](https://medium.com/@abhaysingh71711/gradient-descent-explained-the-engine-behind-ai-training-2d8ef6ecad6f)
- [Gradient Descent Algorithm in Machine Learning](https://www.geeksforgeeks.org/gradient-descent-algorithm-and-its-variants/)
- [Wikipedia: Gradient Descent](https://en.wikipedia.org/wiki/Gradient_descent)

For backpropagation, I would recommend reading the following articles:

- [Wikipedia: Backpropagation](https://en.wikipedia.org/wiki/Backpropagation)
- [Backpropagation in Neural Networks](https://www.geeksforgeeks.org/machine-learning/backpropagation-in-neural-network/)
