### Introduction

From previous lessons, you know that the derivative is equal to the slope of the tangent line along a graph.  The importance of the derivative is that it tells us the rate of change at a given point.  In previous lessons, we discussed the broad problem as how to calculate the rate of change for a function at a given point: that is, the derivative.  We said that the derivative of a function at a certain point is just the slope of the function at that point.  And to calculate that slope of a function when the change in the input is zero, we simply make our change in input smaller and smaller to approach a change of zero, and see what our $ \Delta y/\Delta x $ converges upon.

For example, we saw the following table: 

| $ \Delta x $        | $ \Delta y/\Delta x $|
| ------------- |:-------------:|
| .1      | -171,000      |
| .01 | -179,100     |
| .001 | -179,910      |
| .0001 | -179,991      |


This convergance around one number is called the **limit **.  And we can describe what we see in the above table as the expression: 


 $$ \frac{dy}{dx} {x=0} = \lim_{\Delta x\to0} \Delta y / \Delta x = -180,000  $$.

We read this as the limit of $\Delta y / \Delta x $ -- that is the number $\Delta y / \Delta x $ approaches -- as  $ \Delta x $ approaches zero is -180,000.  So $ \lim_{\Delta x\to0} \Delta y / \Delta x $ is our definition of the derivative. 

### Our rules for calculating the derivative

So far, we have calculated the derivative by changing our delta as reflected in the table above, and seeing the convergance.  However, mathematicians have given us shortcuts to calculate the derivative, and that is what we'll learn about here.  

##### The power rule

The first rule for us to learn is the power rule.  The power rule states is expressed as the following.  Given the following:

$$f(x) = x^r $$

Then, the derivative, $f'(x)$ is: 
$$ f'(x) = r*x^{r-1} $$

This says that if variable, like $x$, that is raised to a exponent $r$, then the derivative of that function is the exponent $r$ multiplied by the variable, with the variablae raised to one minus the original exponent.

So for example, consider the following function:

$$f(x) = x^2 $$
$$f'(x) = 2*x^{2-1} =  2*x^1 = 2*x $$

Think about what this is saying about our function, $f(x) = x^2 $.  It says, for $f(x) = x ^2$, a small change in $x$ produces an increase in $f(x) $ equal to 2 times the $ x $ value.  

So when $ x = 2 $, we solve for the derivative at that point by simply plugging in $ 2 $ whenever we see $ x $. 
So as: 
$$ f'(x) = 2*x $$
$$f'(2) = 2*2 = f'(2) = 4 $$

And when $ x = 10 $, then,

$$f'(10) = 2*10 = 20$$ 

We won't prove the power rule here.  But hopefully you can see that it does seem to fit our graph of the function $f(x) = x^2$ and of the lines tangent to $f(x)$.

In [17]:
import plotly
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

unscaled_values = list(range(-30, 41, 1))
x_values = list(map(lambda point: point/10, unscaled_values))

def y(x):
    return x**2

def line_function_data(line_function, x_values):
    y_values = list(map(lambda x: line_function(x), x_values))
    return {'x': x_values, 'y': y_values}

def build_tangent_line(original_function, x, line_length = 5, delta = .01):
    curve_at_point = derivative_at(original_function, x, delta)
    slope = curve_at_point['slope']
    x_minus = x - line_length
    x_plus = x + line_length
    y = original_function(x)
    y_minus = y - slope * line_length
    y_plus = y + slope * line_length
    text = '    slope:' + format(slope, '.2f')
    return {'x': [x_minus, x, x_plus], 'y': [y_minus, y, y_plus], 'mode': 'lines+text', 'text': [text], 'textposition': 'right'}

def derivative_at(original_function, x, delta = .01):
    numerator = original_function(x + delta) - original_function(x)
    slope = numerator/delta
    return {'value': x, 'slope': slope}

tangent_at_neg_one = build_tangent_line(y, -1, .4, .001)
tangent_at_two = build_tangent_line(y, 2, .4, .001)
tangent_at_three = build_tangent_line(y, 3, .4, .001)

def plot(traces):
    plotly.offline.iplot(traces)

x_squared = line_function_data(y, x_values)

plot([x_squared, tangent_at_neg_one, tangent_at_two, tangent_at_three])

It seems reasonable that the slope of the line tangent to a curve is $2*x$.  So our power rule for derivatives looks good.

Now let's try our power rule with a linear function.  For example, consider the function:

$ f(x)= 4x$

Note that this can also be written as $f(x) = 4x^1$.

Applying the power rule, we have: 

$$ f'(x) = 1*4*x^{1 - 1} = 4*x^0 = 4*1 = 4$$

Ok, so the power rule tells us that $f'(x) = 4 $.  So this is saying that for any small change in $x$, the change in $f(x) $ always 4 times that change in $x$.  Once again let's look a plot $f(x)$ and $f'(x) $ for various values of $x$ to see if this holds true.  We have:

In [22]:
def linear_function(x):
    return 4*x

linear_tangent_at_neg_one = build_tangent_line(linear_function, -1, .4, .001)
linear_tangent_at_two = build_tangent_line(linear_function, 2, .4, .001)
linear_tangent_at_three = build_tangent_line(linear_function, 3, .4, .001)

four_x = line_function_data(linear_function, list(range(-2, 3)))
plot([four_x, linear_tangent_at_neg_one, linear_tangent_at_two])

Notice that for the function $f(x) = 4x$, the slope of our tangent never changes.  It is constant.  So unlike our function, $f(x) = x^2 $ the rate of change in our function is the same regardless of the value of $x$.  And as you can see, our power rule for derivatives proves right again.

##### The constant factor rule

The above also made use of the constant factor rule.  The constant factor addresses how to take the derivative of a function multiplied by a constant. 

So in the above example, we our function of $f(x) = 4*x$.  Now, the derivative of that function

$$f'(x) = 3 * \frac{dy}{dx}(x) $$

Applying the power rule, we know that $ \frac{dy}{dx}(x) = 1 $, so we have: 

$$f'(x) = 3 * \frac{dy}{dx}(x) = 3*1 = 3$$

In the general case, we can say, consider the function $a*f(x)$ where $a$ is a constant (that is, is a number and not a variable).  Then the derivative $\frac{dy}{dx}(a*f(x)) = a * \frac{dy}{dx}*f(x) $.  

Now, don't let the fancy equations above confuse you.  The rule simply says if a variable is multiplied by a constant (i.e. a number), then to take the derivative of that term, apply the power rule to the variable and multiply the variable by that same constant.

So given the function 
$$f(x) = 2x^2 $$
$$f'(x) = 2*\frac{dy}{dx} x^{2} = 2*2*x^{2-1} = 4x^1 = 4x $$

That's the constant factor rule in action.

##### The addition rule

So far all of our functions have only had one term.  Remember that a term is a constant or variable that is separated by a plus or minus sign.  So the function, $f(x)$ below has three terms:
    
$ f(x) = 4x^3 - x^2 + 3x $

First, we say that this function has two terms.  A term is .  Ok, so to take a derivative of a function that has multiple terms, simply take the derivative of each of the terms individually.  So $ f'(b) = 12b^2 - 2b + 3  $.  Do you see what we did there, we simply applied our previous rules to each of the terms individually and continued to add or subtract the terms accordingly.

#### The chain rule

Ok, one more rule and then we are done.

### Summary

***
This section is a little different in that we did not intuitively prove the rules we will apply.  But hopefully, you still have an intuition for the derivative of a function.  The derivative of a function at a given point is simply the rate of change of that function at that point.  And we calculate it with the formul

In this section we saw that we can find the minimum error by following the line tangent to a graph.  And we can move along by following the line tangent to the spot we are currently located.  We then saw how this holds for a two-dimensional graph, by considering how our error changes with respect to a change in b.  We identified this change in output from an infintesimally small change in input as our derivative.  

Then we considered three rules that allow us to calculate our derivative.  The most tricky of these is the power rule, which says that if $f(b) = b^n$, then $ f'(b) = n * b^{n -1} $.  We still haven't seen how derivativesgive us a way to understand gradient descent, but we will shortly when we consider how to take derivatives when we have functions with multiple variables, like an error function that is dependent on both m and b.

But first, let's practice what we know about derivatives in a lab.