# Derivatives of straight lines

### Learning objectives

* Understand that derivatives are the instantaneous rate of change of a function
* Understand how to calculate a derivative 
* Understand how to express taking a derivative at a given point, and evaluating a function at a given point mathematically

### Introduction

In the lesson discussing step sizes of our gradient descent algorithm, we filled in some more information on how to find "best fit" regression line with using gradient descent.  Namely, we learned how to more carefully change the y-intercept of the regression line to minimize the residual sum of squares.  

We did this by calibrating the size and direction of of our change of a regression line parameter -- let's say $b$, our y intercept -- to the slope of the line tangent to the cost curve at that value of $b$. By tangent line, we mean a line that "just touches" our curve at a given point.  

So below is a curve that shows the RSS of a regression line with different values of $b$.  Our orange, green, and red lines are each tangent to the curve at their respective points. 

![](./tangent-lines.png)

With our gradient descent algorithm, the larger the absolute value of the slope, the larger the change in our regression line parameter -- that is, the larger our step size.  So we take a much larger step when our slope is -146.17 at $b = 70$ than we do when our slope equals -58.51 at $b = 85$.

So here is what **we know so far:** 
* Our gradient descent technique depends on changing our values according to the slope of our cost curve

Here is **what we do not know:**
* How to find that slope or rate of change of a function at a given point.  

In this lesson, we'll learn how to calculate the rate of change of a function.

We just indicated that we don't know how to find the slope of a function at a given point.  But is that really true?

## Back to what we know

Let's say that we want a function that  represents a person taking a jog.

![](./running-miles.png)

Now our task for this lesson is to be able to calculate the rate of change.  The rate of change comes up when we say the word, per.  As in miles per hour, the rate of the jogger changing the number of hours travelled.  So here what is the number of miles per hour the person is travelling? 

To calculate the miles per hour we can simply see where a person is at a given time, then wait an hour and to see how far he travelled.  Or we can wait two hours and divide distance travelled by two.  Or generically, divide number of miles travlled by the number of hours past.

In the below graph, to calculate the rate of change, we see the distance travelled between hours one and two, and then make our calculation.

![](./deltaxdeltay.png)

So in hours one and two, our jogger went from mile numbers three to six -- indicated by the orange line.  So then miles per hour is $\frac{ distance}{hours}$.  Or generically, in our graph the rate of change is the change in y divided by the change in x.  Let's see some other ways of expressing this: 

* And another way of expressing **change in y** is:  
   * $y_2 - y_1$ or $\Delta y$, read delta y 
* And another way of expressing **change in x** is:  
   * $x_2 - x_1$ or $\Delta x$, read delta x

And in our example, we have: 

* miles per hour =  $\frac{y_2 - y_1}{x_2 - x_1} = \frac{18 - 12}{4 - 6} = \frac{6}{2} = 3$ mph

And generically we can say that: 

* rate of change $= \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}$

### Math it up

Now when we calculated that the rate of change of our jogger is 3 miles per hour.  We really calculated the derivative.  The derivative is just the rate of change.  Of course, we know that in math we express our functions as the following: 

$$f(x) = 3x $$

![](./fxderivative.png)

Now let's our derivative in terms of a function $f(x)$: 

So previously we had: 

* rate of change $= \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}$

And now we have 
* derivative = rate of change $= \frac{rise}{run} = \frac{\Delta f}{\Delta x} = \frac{f(x_2) - f(x_1)}{x_2 - x_1}$

So instead of $y$, we are using $f(x)$ or $f$ for short.  In the equation far to the right, you see that we replaced $y_2 - y1$ with $f(x_2) - f(x_1)$.  This makes sense, because really when we say $y_2$ and $y_1$, we mean the function's output at the first x value and the function's output at the second x value.  

Ok, now we can also express this derivative in terms of delta x.  Here it is: 

derivative = $\frac{\Delta f}{\Delta x} = \frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $

How does that make any sense?

Well let's go back to our example again.  Looking at the graph below, our claim is we can calculate the derivative through the formula: 

$$ derivative =  \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1} = \frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $$

Below we see that : 
  * $\Delta x = x_2 - x_1 = 2 - 1 =  1$.
  
And we see that $y_2 = f(x_1 + \Delta x)$ as:

* $y_2 = f(1 + \Delta x) = f(1 + 1) = f(2) = 6 $

![](./fxderivative.png)

So our formula in terms of $ \Delta x$ holds: 

$$\frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $$

The derivative equals the change in output as we change x, divided by our change in x itself.

### Summary 

In this lesson, we saw that the derivative is the change in output per a change in input.  So in the case of our jogger, with out input being time, we see that the derivative is the change in the runner's location (distance travelled) divided by the amount of time passed.

A lot of the tricky parts of derivatives is the mechanisms of expressing it.  Graphically, we see that the derivative is simply the rise over run or 

$$ derivative = \frac{y_2 - y_1}{x_2 - x_1} $$

Then we saw that we can express the derivative in terms of $f(x)$ as in:

$$ derivative = \frac{f(x_2) - f(x_1)}{x_2 - x_1} $$

And finally we saw how we can express the derivative in terms of $\Delta x$ as in:

$$ derivative = \frac{f(x_1 + \Delta x) - f(x_1)}{\Delta x} $$


The tricky part of this is: $f(x_1 + \Delta x)$.  But remember, this equals $f(x_2)$, as to calculate $x_2$ we can start at $x_1$ add our $\Delta x$ to see the output at this value of $x_1 + \Delta x$.  So we can still read this equation of the derivative as the change in a function's output per a change in it's input.

The rate of change is simply the distance travelled in y, or $y_2 - y_1$, divided by the distance travelled in x, or $x_2 - x_1$.  Let's refer to the distance travelled in x as $\Delta x$, pronounced delta x, and the distance travelled in y as $\Delta y$, pronounced delta y.  

So we have:

rate of change $ = \frac{rise}{run} = \frac{\Delta y}{\Delta x} = \frac{y_2 - y_1}{x_2 - x_1}$

import pdb
three_x = [(3, 1)]

from graph import build_layout
three_x_trace  = trace_values(x_values, y_values, mode = 'line')

delta_x_trace_three_x = delta_x_trace(three_x, 1, 1)
three_x_trace  = trace_values(x_values, y_values, mode = 'line')
delta_f_trace_three_x = delta_f_trace(three_x, 1, 1)

In [5]:



layout = build_layout(x_axis = {'title': 'number of hours', 'range': [0, 5]}, y_axis = {'title': 'distance in miles', 'range': [0, 9]})
plot([three_x_trace,delta_f_trace_three_x, delta_x_trace_three_x], layout)

def trace_values(x_values, y_values, mode = 'markers', name="data", text = [], options = {}):
    trace = {'x': x_values, 'y': y_values, 'mode': mode, 'name': name, 'text': text}
    trace.update(options)
    return trace

def term_output(term, x_value):
    return term[0]*x_value**term[1]

def output_at(list_of_terms, x_value):
    outputs = list(map(lambda term: term_output(term, x_value), list_of_terms))
    return sum(outputs)

def delta_f(list_of_terms, x_value, h):
    return output_at(list_of_terms, x_value + h) - output_at(list_of_terms, x_value)

def delta_f_trace(list_of_terms, x_value, delta_x):
    initial_f = output_at(list_of_terms, x_value)
    delta_y = delta_f(list_of_terms, x_value, delta_x)
    return trace_values(x_values=[x_value + delta_x, x_value + delta_x], y_values=[initial_f, initial_f + delta_y], text=[str(initial_f), str(initial_f + delta_y)], mode = 'lines+text', name = 'y2 - y1 = ' + str(initial_f + delta_y) + ' - '  + str(initial_f) + ' = ' + str(delta_y), options = {'textposition': 'right'})

def delta_x_trace(list_of_terms, x_value, delta):
    initial_f = output_at(list_of_terms, x_value)
    return trace_values(x_values=[x_value, x_value + delta], text=[str(x_value), str(x_value + delta)], y_values=[initial_f, initial_f], mode = 'lines+text', name = 'x2 - x1 = ' + str(initial_f + delta) + ' - '  + str(initial_f) + ' = ' + str(delta), options = {'textposition': 'bottom'})

In [25]:
import plotly
from plotly.offline import iplot, init_notebook_mode
init_notebook_mode(connected=True)

from graph import plot



x_values = list(range(0, 10, 1))
y_values = list(map(lambda x: 3*x,x_values))
delta_f_trace_three_x = delta_f_trace(three_x, 1, 1)
delta_x_trace_three_x = delta_x_trace(three_x, 1, 1)
three_x_trace  = trace_values(x_values, y_values, mode = 'line')
layout = build_layout(x_axis = {'range': [0, 5]}, y_axis = {'range': [0, 9]}, options = {'title': 'f(x) = 3x'})
plot([three_x_trace, delta_f_trace_three_x, delta_x_trace_three_x], layout)

In [15]:
from graph import build_layout
three_x_trace  = trace_values(x_values, y_values, mode = 'line')

delta_x_trace_three_x = delta_x_trace(three_x, 4, 2)
three_x_trace  = trace_values(x_values, y_values, mode = 'line')


layout = build_layout(x_axis = {'title': 'number of hours', 'range': [0, 5]}, y_axis = {'title': 'distance in miles', 'range': [0, 9]})
plot([three_x_trace], layout)

In [21]:
from graph import build_layout
three_x_trace  = trace_values(x_values, y_values, mode = 'line')

delta_x_trace_three_x = delta_x_trace(three_x, 1, 1)
three_x_trace  = trace_values(x_values, y_values, mode = 'line')
delta_f_trace_three_x = delta_f_trace(three_x, 1, 1)


layout = build_layout(x_axis = {'title': 'number of hours', 'range': [0, 5]}, y_axis = {'title': 'distance in miles', 'range': [0, 9]})
plot([three_x_trace,delta_f_trace_three_x, delta_x_trace_three_x], layout)



In [8]:
from graph import build_layout
three_x_trace  = trace_values(x_values, y_values, mode = 'line')
layout = build_layout(x_axis = {'title': 'number of hours', 'range': [0, 6]}, y_axis = {'title': 'distance in miles', 'range': [0, 6]})
plot([three_x_trace], layout)

[(3, 1)]

The slope of a line tangent to the function is called the derivative.  
* Or equivalently, the **derivative** is defined as the instantaneous rate of change of a function.  

This makes sense.  The  more our function changes a specific point, the larger the magnitude of the slope at that point.

> ** What's magnitude?** For these purposes, magnitude describes the absolute value of a number.  We use the word because it's not accurate to say that -100 is larger than -99.  After all, -100 is more negative and thus *smaller* than -99.  But it is correct to say that the magnitude of -100 is larger than the magnitude of -99, as absolute value of -100 equals 100 and the absolute value of -99 is 99.   

So a derivative answers questions about change.  If you look at our blue curve above, the various slopes indicate how much our output is changing (in this case, our RSS) as we increase our input (here, our value of $b$).  At the point $b = 70$, the cost curve decreases a lot.  At the point, $b = 90$ the cost curve is still decreasing, but significantly less.  
* Thus at both $b = 90$ and $b = 70$ the derivative is negative as the change to cost is downward.  
* But the magnitude of change at the two points varies: when $b = 70 $, the derivative is -146.17 and when $b = 90$ the derivative is -21.07.

Ok, so the derivative of a function is the rate of change of a function.  But how do we calculate the rate of change of a function?

### Calculating the derivative

Remember that $f(3)$ means to evaluate the function when $x = 3$.  Well $f'(3)$ means (read as f prime of 3) means to evaluate *the derivative* of the function when $x$ equals $3$.  Or to put it another way, evaluate the instantaneous rate of change of a function when $x$ equals $3$.  Or to put it yet another way, evaluate the slope of the tangent line of the function when $x = 3$.  We do that by writing $f'(3)$.

Ok, so now how do we calculate the instantaneous rate of change of our function, $f(x) = 3*x$?  First, let's see that function graphically. 

![](./straight-line.png)

Ok to measure $f'(3)$ we are measuring the rate of change at $f(3)$.  

The way we can calculate the rate of change at $f(3)$ is to see the output at $f(3)$, and then increase our value of $x$ to $x = 5$, and evaluate $f(5)$.  Now we see how large the change in output was over that amount.  The rate of change is simply our change in output divided by the change in output. 

To translate this to math we can express this as: 

$$f'(3) = \frac{f(5) - f(3)}{5 - 3}$$

Now we can solve this function by plugging in values for our function, $f(x) = 3*x$ where x = 5 and x = 3.  So we have:

$$f(5) = 3*5 $$

$$f(3) = 3*3 $$

So:

$$ f'(3) = \frac{f(5) - f(3)}{5 - 3} = \frac {15 - 9}{5 - 3} = \frac{6}{2} = 3$$

If you think about it, it makes sense that the rate of change of a function $f(x) = 3x $ is 3.  For every unit of $x$ that we increase, the output, $f(x)$ increases by 3.  Looking at our line again, we can see this graphically.  Regardless of where we are evaluating the slope of the tangent line, for this function, that slope is always 3.

![](./straight-line-tangent.png)

So that's how we calculate the derivative.  Derivative is simply the rate of change.  In the graph above our rate of change is always the same -- it's constant.  For every change one unit change in $x$, our output is changing by 3.  Our rate of change is 3.  

### The Derivative mathematically 
Expressed mathematically, our formula for calculating the derivative looks like this:

$$ f'(x) = \frac{f(x + h) - f(x)}{h}$$ 

Take some time to take in this formula.  It's not going away.  This formula encapsulates our earlier approach. In our above approach, we let $h = 2$.  Then we calculated $$f'(3) = \frac{f(3 + h) - f(3)}{h} =  \frac{f(3 + 2) - f(3)}{2} = \frac{f(5) - f(3)}{2} = \frac{15 - 9}{2} = 3$$  

Another word for $h$ is delta, or $\Delta $ or $\Delta x $.  Delta just means change.  We say $\Delta x $ because that is what we are changing, $x$.  Similarly, the change in output is sometimes expressed as $\Delta f $.  So another way of writing our function is: 

$$ f'(x) = \frac{\Delta f}{\Delta x}$$

The derivative of a function $f(x)$ is the change in the output divided by the change in the input, $x$.

### Derivatives of non-linear functions

So we saw previously that the derivative is the rate of change of our function.  We express this as $ f'(x) = \frac{\Delta f}{\Delta x}$. 

Things becomes trickier when working with more complicated functions.  And we will run into these functions.  For example, let's consider how to take the derivative of something resembles our cost curve.  
> After all, figuring out the slope at a given point of a cost curve is what led us here.

![](./fake-cost-curve.png)

This is the graph of the function $f(x) = (300*x - 300)^2 $.  How do we take the derivative when our function looks like this? Let's start by using our earlier technique to calculate the derivative at the point $x = 0$.  

> So we evaluate $f(0)$ and we evaluate our function at a little more than zero, say $f(1)$, so we let $h = 1$.  Ok let's do it!

$$ f'(x) = \frac{f(x + h) - f(x)}{h} $$ 

$$ f'(0) = \frac{f(0 + h) - f(0)}{h} $$ 

$$ f'(0) = \frac{f(0 + 1) - f(0)}{1} = \frac{f(1) - f(0)}{1}$$ 

Now let's calculate $f(0)$ and $f(1)$ and plug in the values.

$$f(0) = (300*0 - 300)^2 = (-300)^2 =  90,000 $$

$$f(1) = (300*1 - 300)^2 = 0^2 = 0 $$

$$f'(0) = \frac{0 - 90,000}{1- 0} = -90,000 $$

Ok, sweet!  Now let's show a line with this slope where $x = 0$.

![](./curve-tangent-d1.png)

Take a close look at the straight line in the graph above.  That straight line is a supposed to have the same slope as the blue curve at the point $x = 0$.  After all, we calculated the derivative at that point, and that gave us a slope of -90,000, yet that doesn't seem to be matching the slope of the curve at that value of $x = 0$.

The slope of the straight line should be pointing more downwards, so that our *tangent* line is just touching.  Where did we go wrong?  Let's take another look at our calculation of the derivative. 

$$f(0) = (300*0 - 300)^2 = (-300)^2 =  90,000 $$

$$f(1) = (300*1 - 300)^2 = 0^2 = 0 $$

$$f'(0) = \frac{0 - 90,000}{1- 0} = -90,000 $$

The problem is that if we calculate change in our output divided by the change in our input, and we set $h = 1$, then we are really calculating the rate of change of the function from zero to one.  





But what we **want** to do is calculate the rate of change at just that point $x = 0$, after all we want the *instantanous* rate of change - and that is a different matter.  So we can't just calculate our change by checking once, then waiting and checking again later.  Our derivative means we are calculating how fast a function is changing at any given moment, and precisely at that moment.  And unlike in our function of the line $y = 3x $, here the amount that $y$ decreases or increases is always changing.  The larger our value of $h$, the less our derivative reflects the rate of change at just that point. 

So what we need to do is decrease our value of $h$ to such a small number that it is zero.  But it's ludicrous to calculate the amount of change in an output when the change in our input is zero.  When change in our input is zero, it means there is no change.  How can we calculate rate of change at precisely one spot on a curve?  There's no change at all at just one point.  Change happens across multiple points.

So how do we calculate the rate of change of our function across no change in input?  We use our imagination.  Really.  We calculate the derivative with a $\Delta $ of .1, then calculate it again with a $\Delta $ of .01, then again with $\Delta $ .001.  Our derivative calculation should show convergance on a single number as our $\Delta $ approaches zero and that number is our derivative.  

** That is, the derivative of a function is a change in the function's output as h, that is $\Delta x $,  approaches zero **.    

When $\Delta x = 1$ 

![](./curve-tangent-d1.png)

When $\Delta x = .1$ 

![](./curve-tangent-pt1.png)

When $\Delta x = .01$ 

![](./curve-tangent-pt01.png)

Notice that our curves approach being tangent to the line as we decrease $\Delta x$.  In addition, our slopes converge to one number.  In fact, by decreasing $\Delta x$ we can see a fairly clear pattern.

| $ \Delta x $        | $ \Delta y/\Delta x $|
| ------------- |:-------------:|
| .1      | -171,000      |
| .01 | -179,100     |
| .001 | -179,910      |
| .0001 | -179,991      |


As you can see, as $\Delta x $ approaches zero, $f'(x) $ approaches $ -180,000 $.  This convergance around one number as we change another number, is the **limit **.  So to describe the above, we would say, at the point $x = 0 $, the limit of $\Delta y \div \Delta x $ -- that is the number $\Delta y \div \Delta x $ approaches -- as  $ \Delta x $ approaches zero is -180,000.  We can abbreviate this into the following expression: 

When $x = 0,\lim_{\Delta x\to0} \Delta y / \Delta x = -180,000  $.

Or, better yet, we can update and correct our definition of derivative to equal:

$$ f'(x) = \lim_{ h\to0} \frac{f(x + h) - f(x)}{h} $$ 

So the derivative is the change in output as we *just nudge* our input.  That is what we mean by *instantaneous rate of change*.  So that is our real definition of a derivative, and you better not forget it.

![](./kris-kross.jpg)

### Summary

In this section, we learned about derivatives.  A derivative is the instantaneous rate of change of a function.  To calculate the instantaneous rate of change of a function, we see the value that $\frac{\Delta y}{\Delta x} $ approaches as $\Delta x $ approaches zero.  This way, we are not calculating the rate of change of a function across a given distance, but rather are finding the rate of change instantaneously. 