# Applying the chain rule

### Introduction

At this point, we would probably like to go directly into using gradient descent to find the weights and biases of a neuron that minimize our cost curve. But doing so is not so simple. 

Remember that with gradient descent, we find the change in our function's output as we alter each parameter, and then step in in the steepest descent.  The issue with applying this technique is that the impact on the output is more indirect.  


### Seeing the issue

In previous lessons, we found the rate of change in our output as we changed the parameters in a function like:

$$z(w,b) = 3w + b$$

To find how the output of $f$ changes as we change each parameter, we take the partial derivative with respect to each parameter, above $w$ and $b$.  But when trying to find the parameters that minimize a cost curve, this time we are not finding the impact of altering the parameters on that same function $f$, but on another function, our cost function: 

$$J(w,b) = (y - z(w, b))^2$$

> For right now, we are leaving off the summation, just to keep our function a little less intimidating.  

So as we said this is more indirect.  And really it's more indirect than even that, because $z(w, b)$ is just the linear component, which is then fed to the activation function $\sigma$ to make the prediction, which is then passed into our cost function:

$$J(w,b) = (y - \sigma(z(w, b)))^2$$

So as we said, seeing how the changing the parameters $w$ and $b$ impact our cost curve is more indirect than we previously saw.  But don't worry, mathematicians have already figured out how to solve problems like the one above.  We just have to learn their approach.

### Introducing the Chain Rule

The problem that we are running into above, is how to find the derivative of a composite function.  Our cost function is depends on our hypothesis function which is composed of the sigmoid and linear function

$J(w, b) = J(y - \sigma(z(w, b))$

Let's see how we can find the derivative of composite functions with a simpler example.

$$f(x) = (3x + 1)^2$$

#### 1. Break it down

* *In Math*

Do you see how this is a composite function?  We start with the function:

$$f(x) = (3x + 1)^2$$

And then we break this into two functions, $h(x)$ and $g(y)$ where:

$$ g(y) = y^2$$
$$h(x) = 3x + 1$$

Now we can rewrite our function, $f(x)$ as:

$$f(x) = g(h(x)) $$ 

So we broke our function $f(x)$ down above, by defining two functions $h(x)$ and $g(y)$, and then passing the output of $h(x)$ into $g(y)$.

* *In Code* 

We can also translate breaking down the function $f(x) = (3x + 1)^2$ into code:

In [30]:
# def f(x): return (3*x + 1)**2 is equivelent to:
def f(x): 
    return g(h(x))

def g(y): 
    return y**2

def h(x):
    return (3*x + 1)

In [31]:
f(2)

49

#### 2. Finding the derivative

Now we have rewritten our function 
$$f(x) = (3x + 1)^2$$ as:

$$f(x) = g(h(x)) $$  where:

$$h(x) = 3x + 1$$
$$ g(h) = h^2$$

To find the derivative $f(x)$ with respect to $x$ we do the following:

> take the derivative of the our function outer $g(h)$ and multiply it by the derivative of the inner function $h(x)$.

$f'(x) = g'(h(x)) * h'(x)$

Or using our other notation: $\frac{\delta f}{\delta x} = \frac{\delta g}{\delta h}*\frac{\delta h}{\delta x}$

Let's solve these individually.

$g(h(x)) =  h(x)^2$ and $\frac{\delta g}{\delta h} =  2h(x)$

$h(x) =  3x + 1$, and $\frac{\delta h}{\delta x} =  3$

Now we plug these components into $\frac{\delta f}{\delta x} = \frac{\delta g}{\delta h}*\frac{\delta h}{\delta x}$.

Substituting we get:

$\frac{\delta f}{\delta x} = \frac{\delta g}{\delta h}*\frac{\delta h}{\delta x} = 2h(x)*3 = 6*h(x)$

And because $h(x) = (3x + 1)$, substituting further we get: 

$\frac{\delta f}{\delta x} = 6h(x) = 6(3x + 1) = 18x + 6 $

### Wrapping Up

So what we just did is pretty cool.  We were able to see how calculate how nudging the value of $x$ impacts our function $$f(x) = (3x + 1)^2$$.

And what we found was that: 

$\frac{\delta f}{\delta x} = 18x + 6 $

So for example, when $x = 3$, the rate of change of our function with respect to $x$ is $18*3 + 4 = 60$.  And we can see this is the same answer we get with our old formula of the derivative being $\frac{\delta y}{\delta x} = lim_{\delta x\to0}\frac{y_1 - y_0}{x_1 - x_0}$.

In [23]:
(f(3.1) - f(3))/(3.1 - 3)

60.90000000000012

### Summary

* Why this matters