# Chapter 8: Gradients + Partial Derivatives + Chain Rule 

Now, we've (well, I skipped, but the book did have it) talked about some elementary calculus. Now, we can do stuff that moreso focuses on the kind of things we'll need for our journey here.

## Section 1: Partial Derivatives

Partial derivatives are step one in our journey. A partial derivative measures how much impact a single input has on a function's output. As we're doing so, what that is really is just turns into a derivative with respect to the input. The partial derivative is the singular equation, and the full multivariate function's derivative consists of a set of equations called the gradient. Simply put, the gradient is a vector of the size of inputs containing partial derivative solutions with respect to each of the inputs. 

Calculating the partial derivative of a sum is super straightforward! All you need to do is calculate it like a regular derivative but instead go and set all other inputs as constants. I don't need to show an example here.

For multiplication, it's very similar, except you must remember that you can make use of being able to take constants out of a derivative. I'll throw you a brain teaser to check your skills: what is the derivative of the following equation with respect to x?
$$
f(x,y,z) = 3x^{3}z - y^{2} + 5z + 2yz
$$  
Now, if you don't understand how to get to the following, you should review your Calculus:
$$
\frac{\partial}{\partial x} f(x,y,z) = 9x^{2}z
$$ 

One more partial derivative to review here is that of the max. The max function returns the greatest input. So, if we take a look at the function:  
$$
f = max(x,y) \rightarrow \frac{d}{dx} = 1(x>y)
$$
The above will happen where x>y returns a 1 if true and 0 if false, so therefore the d/dx is 1 if true, 0 if false. 

The reason we go over this is because it relates to our ReLU function, where all it's technically doing is max(x, 0), therefore meaning that it's derivative with respect to x is just 1(x>0), which will also be 1 if true, 0 if false.

 Pay attention to the above, where we use the $\partial$ operator when there is more than a single parameter and we are trying to get the partial derivative. On the other hand, we use the d operator when there is only one operator and we want the whole derivative.
 
## Section 2: Gradients

The gradient is a vector comprised of all the partial derivatives of a function. For example, if we take our function $f(x,y,z) = 3x^{2}z - y^{2} + 5z + 2yz$ then the gradient for it would be:
$$
\nabla f(x,y,z)
=
\begin{bmatrix}
\frac{\partial}{\partial x} f(x,y,z) \\
\frac{\partial}{\partial y} f(x,y,z) \\
\frac{\partial}{\partial z} f(x,y,z)
\end{bmatrix}
=
\begin{bmatrix}
9x^{2}z \\
-2y + 2z \\
3x^{3} + 5 + 2y
\end{bmatrix}
$$

As you can see, it's denoted by the $\nabla$ and it simply a matrix where every row corresponds to the partial derivative of the function with respect to the corresponding variable. 

In the long run, we'll be using both derivatives and these gradients to perform gradient descent using the chain rule -- which is the "backward pass" of model training.  

## Section 3: The Chain Rule

During our forward passes, we are continuously using our outputs to funnel into our next step. For example, if we carry out the following:
$$
z = f(x) \\
y = g(x)
$$  
Then we could rewrite it as $y = g(f(x))$. In this sense, it's nearly recursive in the sense it generally goes $output = outputActivation(dense2(dense1activation(dense1(inputs).output).output).output$, if that makes sense? That is effectively chaining.

The chain rule is stated below:
$$
1. \frac{d}{dx}[(f(x))^{n}] = n(f(x))^{n-1} * f'(x) \\
2. \frac{d}{dx}[f(g(x))] = f'(g(x)) * g'(x)
$$

I could bore you with more calculus and an example of the chain rule. But I don't think that's necessary and within the scope of my encapsulation of this book. 

Our next chapter actually gets to something really interesting: backpropagation. That makes use of all the Calculus (much of which I've omitted here) that has been covered in chapter eight of this book.

### Anyways, that's it for this chapter! Thanks for following along with my annotations of *Neural Networks from Scratch* by Kinsley and Kukieła!
