# __Backpropagation__

What is backpropagation? Backpropagation is a process, where we go through the neural network backwards, in an effort to calculate the gradients of weights and biases. Then those gradients are used to fine tune to the weights and biases, to the minima of the loss function. A gradient quantifies the direction and rate of change of a function, which can be useful, in heading toWards the cost function minima. These gradients are calculated using calculus, specifically derivatives. Since our neural network is just a fancy function, we can use this strategy to calculate gradients, and decrease the cost function, making our model better. 

## __Partial Derivative__

Our neuron function, consists of multiple inputs, such as the featurs, the weights and biases. Using the partial derivative, we can find out what as the impact of one weight or bias, on the entire cost function. So let's start by doing some partial derivatives. Consider a function $f(x, y) = x + y$. Let's calculate the partial derivative of $f(x, y)$ with respect to $x$ and $y$

#### $\frac{\partial }{\partial x}f(x,y) = \frac{\partial }{\partial x}x + y = \frac{\partial }{\partial x}x + \frac{\partial }{\partial x}y = 1 + 0 = 1$
#### $\frac{\partial }{\partial y}f(x,y) = \frac{\partial }{\partial y}x + y = \frac{\partial }{\partial y}x + \frac{\partial }{\partial y}y = 1 + 0 = 1$

### __The partial derivative of a sum equals the sum of the partial derivatives__ 

Now let's try $f(x, y) = x * y$

#### $\frac{\partial }{\partial x}f(x,y) = \frac{\partial }{\partial x}x * y = y\frac{\partial }{\partial x}x$
#### $\frac{\partial }{\partial y}f(x,y) = \frac{\partial }{\partial y}x * y = x\frac{\partial }{\partial y}y$

### __To calculate the partial derivative of multiplication, we move independent variables outside of the derivative and treat them like constants__

Now let's do a harder example, with 3 variables

#### $f(x, y, z) = 3x^3z - y^2 + 5z + 2yz$

#### $\frac{\partial}{\partial x}f(x, y, z) = \frac{\partial}{\partial x}3x^3z - y^2 + 5z + 2yz = \frac{\partial}{\partial x}3x^3z - \frac{\partial}{\partial x}y^2 + \frac{\partial}{\partial x}5z + \frac{\partial}{\partial x}2yz = \frac{\partial}{\partial x}3x^3z + 0 = z\frac{\partial}{\partial x}3x^3 = 9x^2$

#### $\frac{\partial}{\partial y}f(x, y, z) = \frac{\partial}{\partial y}3x^3z - y^2 + 5z + 2yz = \frac{\partial}{\partial y}3x^3z - \frac{\partial}{\partial y}y^2 + \frac{\partial}{\partial y}5z + \frac{\partial}{\partial y}2yz = \frac{\partial}{\partial y}2yz - \frac{\partial}{\partial y}y^2 = 2z - 2y$

#### $\frac{\partial}{\partial z}f(x, y, z) = \frac{\partial}{\partial z}3x^3z - y^2 + 5z + 2yz = \frac{\partial}{\partial z}3x^3z - \frac{\partial}{\partial z}y^2 + \frac{\partial}{\partial z}5z + \frac{\partial}{\partial z}2yz = \frac{\partial}{\partial y}3x^3z + \frac{\partial}{\partial z}2yz + \frac{\partial}{\partial z}5z = 3x^3 + 2y + 5$

Now that we have looked at how partial derivative works for sum and multiplication, we need to look at the derivative of the ReLu function which is:

#### $\frac{d}{dx}max(0, x) = 1 (x > 0)$

## __Gradient__
A gradient is a vector, that comprises of all of the partial derivatives of a function, with respect to each variable

Here is the gradient for $f(x,y,z) = 3x^3z - y^2 + 5z + 2yz$:

$\nabla f(x, y, z) = \begin{bmatrix}
9xz^2 \\
2z - 2y \\
3x^3 +2y + 5 \\
\end{bmatrix}$

## __Chain Rule__
Our loss function is just a chain of functions, such as softmax, relu and dense. To improve loss, we need to know how each weight and bias impacts it, and for a chain of functions, we can use the chain rule.

#### $\frac{d}{dx}f(g(x))=f'(g(x)) * g'(x)$

To calculate the partial of a chain of function with respect to a variable, we take the partial derivative of the outermost function, with respect to the inner function. Then multiply this by the partial derivative the current inner function, and the next inner function, and repeat until you reach the variable.

Let's do an example:

$g(x) = 2x^2$

$f(y) = 3y^5$

#### $\frac{d}{dx}f(g(x)) = f'(g(x)) * g'(x) = 15(2x^2)^4 * 4x = 240x^8 * 4x = 960x^9$

That's for the math behind backpropagation, next we will be applying it, then adding it to our origial code. Let's keep going! 