# **Training of a Perceptron:**

> **A successfully training of a Perceptron (or, Neural Network) requires understanding of different concepts including `Advanced Linear Algebra` and `Advanced Calculus`.**

-------------
----
---

Before beginning to train a perceptron,  we need a solid understanding of the **`forward propagation`** process and its mathematical foundations. This includes knowing how neural networks represent data using row vectors for samples and how weight matrices are structured following PyTorch conventions where weights are stored as **`(out_features, in_features)`** and transposed during computation. We must understand **`bias augmentation`** — the technique of prepending a value of $1$ to input vectors to treat bias as a regular weight parameter. 

Additionally, we need familiarity with **`activation functions`** like **`sign`**, **`ReLU`**, or **`sigmoid`**, and how they introduce **`non-linearity`** into the network. 

Understanding **`matrix multiplication mechanics`** is crucial, particularly how dimensions align when computing **`Y = X @ W^T`**, and how batched operations process multiple samples simultaneously through matrix operations rather than loops. 

Finally, we need to grasp **`loss functions`** such as **`squared error`** or **`cross-entropy`**, which quantify the difference between predicted outputs and true labels, providing the **`objective`** that training seeks to minimize.

Beyond the forward pass mechanics, we need conceptual knowledge of how learning occurs through **`iterative optimization`**. This requires understanding **`gradients`** and **`derivatives`** — how the loss function's rate of change with respect to each weight indicates the direction to adjust that weight for improvement. 

We should know the **`chain rule`** from calculus, which enables backpropagation to compute gradients layer by layer by multiplying `partial derivatives`. 

The concept of `learning rate` is essential, as it controls the step size of weight updates and affects training stability and convergence speed. 

We must understand the difference between `online learning` (updating weights after each sample), `mini-batch learning` (updating after small groups), and `batch learning` (updating after seeing all data), as each has trade-offs in terms of convergence speed and computational efficiency. 

Finally, understanding `epochs` — complete passes through the entire dataset — and the `iterative nature` of training helps us recognize that learning is a `gradual process` of repeatedly computing `predictions`, `measuring errors`, `calculating gradients`, and `updating weights` until the network converges to a solution. 

-------------
---------------
------

**`ABCD_of_Deep_Learning`** is an introduction of deep learning (and specially perceptron); training involves advanced concepts and uses high-level mathematics, covering this in this series is not a good idea as it was intended to explain the basic concepts from idea of Biological Neuron to the idea of modern Perceptron. The continuation of the training (and the topics) can be found in the following link: 

> **Additional Resource:** __https://github.com/rajesh-coventry/100-Days-of-Deep-Learning__