# Gradient Descent in 3d

### Learning Objectives

* Understand how the process of gradient descent when altering both y-intercept and slope variables

### Introduction

In the last section, we talked about how we can think of changing our y-intercept and slope variables of a regression line and the result that changing those variables has on a cost curve.

![regression-scatter.png](attachment:regression-scatter.png)

Now, because our regression lines do not perfectly predict our data, we put a number to a regression line's accuracy by calculating the residual sum of squares (RSS).  And as we know, the size of the RSS is a function of our y-intercept and slope values, and we can plot our RSS as a function of changing these variables.

![](./gradientdescent.png)

Now remember that gradient descent in when just changing one variable means taking a cost curve, and then stepping along the cost curve.  We imagined standing on this two-dimension curve (shown below) and feeling the slope of our cost curve at a specific value to tell us whether to increase or decrease our y-intercept variable and how large of a step to update it.  A step in a direction meant a change in one of our regression variables (in our examples, normally our y-intercept).

![](./tangent-lines.png)

So that was gradient descent in two dimensions.  What does gradient descent mean in three dimensions? 

### Gradient Descent in 3 dimensions

Using our approach of gradient descent, we once again choose an initial regression, which means that we are choosing a point on the graph below, and then walking towards the minimum.  But of course we are now able to walk not just forwards and backwards but across two dimensions.

![](./gradientdescent.png)

To get a sense of how this works in three dimensions, imagine our initial regression line places us at the back corner of the above graph, with a slope of 50, and y-intercept of negative 20.  Now imagine once that we cannot see the rest of the graph - yet we still want to approach the minimum.  How do we do this?

Once again, we feel out the slope of the graph with our feet.  But as we said, this time as we shift our feet we are preparing to walk in two dimensional space.  

![](./traveller-stepping.jpg)

So this is our approach.  We shift horizontally a little bit to determine the change in output in right-left direction, and then shift forward and back to determine the change in output in that direction.  From there we take the next step in the direction of the steepest descent. 

So now, perhaps, you can get a sense of why our technique of gradient descent is so powerful.  Once we consider that in moving towards our best fit lines, we have a choice of moving anywhere in a two-dimensional space, then using the slope to guide us only becomes more important.    

So this is what our approach will be mathematically.  We'll determine the slope in one dimension, then the other. Then, we move where that that slope is steepest.  This will move us towards the minimum.  

To measure the slope in each dimension, one after the other, we'll take the derivative with respect to one variable, and then take the derivative with respect to another variable.

### Partial Derivatives

To measure the slope in each dimension, one after the other, we'll take the derivative with respect to one variable, and then take the derivative with respect to another variable.  Now let's be very explicit about what it means to take the partial derivative with respect to a variable.

Imagine that we have our multivariable function: 

$$f(x, y) = y*x^2 $$

And remember, that function looks like the following: 

![](./parabolayx2.png)

To take a derivative with respect to $x$ means, how does the output change, as we make a nudge only in the $x$ direction. To express this, we say $\frac{\delta f}{\delta x}$.  That symbol is the lower case delta.  And you will see it to express taking the derivative of a multivariable function with respect to one variable.  For example, to see how the output of a function $f(x, y)$ with respect to $y$, meaning as we nudge our inputs over in the $y$ direction, we say $\frac{\delta f}{\delta y}$.

So what does a derivative $\frac{\delta f}{\delta x}$ look like? How do we think of a partial derivative of a partial derivative of a multivariable function?

Well remember how we think of a standard derivative of a one variable function, for example $f(x) = x^2 $. 

![](./tangent-liner.png)

Now the partial derivative of a multivariable function is very similar.  It is equal to a tangent line at a specific $x$ value **and** a specific $y$ value.  Let's break this down by using our patented "freeze-frame" method to see this clearly.  The graphs below shows lines tangent to the curve in the $x$.  

![](./partial-derivatives-3d.png)

Take a good look at those graphs.  The top left graph show $\frac{\delta f}{\delta x}$ at different points of $f(x, y)$ where $y = -1$.  So as you can see, $\frac{\delta f}{\delta x}f(1, 3) = -6$ as shown in the green line in the top left.  That's because when you move to that point on the graph, $(3, -1)$ and then nudge a little bit in the $x$ direction, the change output is $-6$.  And that is represented by the line tangent to the function at that point in the $x$ direction.  You can go through the other points in these graphs, and work through the same logic. 

So with taking the partial derivative $\frac{\delta f}{\delta x}$, you may think about moving to the slice of the graph for a given value of $y$, then moving to the proper value of $x$, and then finding the tangent line at that point.  

So as you can see by going from graph to graph above, $\frac{df}{dx}$ means the change in output from a nudge in $x$ direction, but the derivative is still influenced by $y$ component of the function.  You can see this because for different values of $y$, our slice of the graph looks different, and thus tangent lines at those slices look different.

### One more example

This can be a little mind-bending so let's go through this again for $\frac{df}{dy}f(x, y)$ where $f(x,y) = (y*x^2) $.  Once again, the graph of our function $f(x,y) = y*x^2$ is the following: 

![](./parabolayx2.png)

Now for $\frac{df}{dy}$ of a function $f(x, y) $ you can think sliding through different slices of the function but this time for different values of $x$.  So again, we have our freeze frame, but this time each frame represents ascending vaules along the x axis.  Take a look at the bottom left quadrant: the graph  of the function $f(x,y)$ makes sense as when $x = 2$ then the function is just $f(y) = 4*y $, just $f(3, y) = 9*y$.  So now, to think about taking the derivative, once again we move to a slice of graph for a value of $x$, and then move in the $y$ direction.  So $\frac{df}{dy}$ at $\frac{df}{dy}f(2, y)$ = 4.  As once we get to the slice where $x = 2 $, then we really taking the derivative of the line $\frac{df}{dy}f(y) = 4y$, and we know that the derivative of a line is always just equal to the slope.  Here it's $4$.

![](./partial-derivatives-dy.png)

So that is our technique for a partial derivative.  For $\frac{df}{dy} $ we move to a slice of the curve of a specific value of $x$ and then calculate how much the output changes as we nudge in the $x$ direction.  And for $\frac{df}{dx}$ we move to a slice of a curve of a specific value of $y$ and then calculate how much the output changes as we nudge in the $y$ direction.  Just think slide then nudge.  That's a partial derivative.

### Our rule for partial derivatives

Ok, so now that you understand the slide then nudge, maybe you can understand this little shortcut that we can pull.  For any multivariable function, the variables that you are **not** taking the derivative with respect to, can just be treated as a constant.

For example, with our function of $f(x, y) = y*x^2 $, when taking the partial derivative $\frac{df}{dy}f(x, y)$, when taking the derivative we can treat all values of $y$ as a constant.  Let's do it:


$$f(x, y) = y*x^2 $$

$$\frac{df}{dy}f(x,y) =  (x^2)*\frac{df}{dy}(y) = x^2*1 = x^2$$

So that's all it means to take a partial derivative of something: look at what you are taking a derivative with respect to, and only take the derivative of those types of variables.  And guess what, this result lines up to what we saw earlier.

![](./partial-derivatives-dy.png)

We calculated that $\frac{df}{dy}f(x,y) = x^2 $, and that is what the graphs show.  When $x = 2$ our derivative is always 4.  And when $x$ is $3$ the derivative is always 9.  So even though we are taking $\frac{df}{dy}$, the $x$ value is influencing the steepness of that line.  But by the time we get to our nudge, that value of $x$ is **constant**, it's influenced has already been applied, and then we are seeing how the output changes as we nudge in the $y$ direction.

Now let's try our rule one more time, this time $\frac{df}{dx}f(x, y)$.

$$f(x, y) = y*x^2 $$

$$\frac{df}{dx}f(x,y) = y*\frac{df}{dx}(x^2) = 2*y*x$$

So this time with $\frac{df}{dx}f(x,y) $, we treat $y$ as a constant, as the influence $y$ is first applied by moving to a slice of our graph for a value of $y$.  Then once there, we are evaluating the change in output as we nudge in the $x$ direction.   

![](./partial-derivatives-3d.png)

### Summary

In this section, we have learned how to think about taking the partial derivative of a function.  For the partial derivative, we say we are taking the derivative with respect to a variable.  So for example, we can say for the function $f(x, y)$, take the partial derivative with respect to the variable $x$.  This means we are assessing the output after nudging in the $x$ direction, and we can express this as $\frac{\delta f}{\delta x} $.  Our rule for taking the partial derivative is to treat the variables that we are not taking the derivative with respect to as constants.  Which makes sense, because at the time that we are taking the derivative by making our "nudge" the only variable that is changing is the variable we are taking the derivative with respect to.

In [10]:
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
from graph import line_function_data, line_function_trace, line_function_data, plot, build_tangent_line
init_notebook_mode(connected=True)

def x_squared(x):
    return x**2

tangent_at_one = build_tangent_line(x_squared, 1, .3, .001)
tangent_at_three = build_tangent_line(x_squared, 3, .3, .001)
tangent_at_four = build_tangent_line(x_squared, 4, .3, .001)

x_range = list(range(-10, 51))
x_range_scaled = list(map(lambda x: x/10, x_range))
x_squared_trace = line_function_trace(x_squared, x_range_scaled)
plot([x_squared_trace, tangent_at_one, tangent_at_three, tangent_at_four])

In [8]:
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
from graph import line_function_data, line_function_trace, line_function_data, plot
init_notebook_mode(connected=True)

def y_equals_neg_one(x):
    return (-1*(x**2))

def y_equals_one(x):
    return 1*(x**2)

def y_equals_two(x):
    return 2*(x**2)

def y_equals_three(x):
    return 3*(x**2)
x_range = list(range(-6, 7))

neg_one_trace = line_function_trace(y_equals_neg_one, x_range)
one_trace = line_function_trace(y_equals_one, x_range)
two_trace = line_function_trace(y_equals_two, x_range)
three_trace = line_function_trace(y_equals_three, x_range)


neg_one_tangent_at_one = build_tangent_line(y_equals_neg_one, 1, .4, .001)
neg_one_tangent_at_three = build_tangent_line(y_equals_neg_one, 3, .4, .001)
neg_one_tangent_at_two = build_tangent_line(y_equals_neg_one, 2, .4, .001)

two_tangent_at_one = build_tangent_line(y_equals_two, 1, .4, .001)
two_tangent_at_three = build_tangent_line(y_equals_two, 3, .4, .001)
two_tangent_at_two = build_tangent_line(y_equals_two, 2, .4, .001)

one_tangent_at_one = build_tangent_line(y_equals_one, 1, .4, .001)
one_tangent_at_three = build_tangent_line(y_equals_one, 3, .4, .001)
one_tangent_at_two = build_tangent_line(y_equals_one, 2, .4, .001)

three_tangent_at_one = build_tangent_line(y_equals_three, 1, .4, .001)
three_tangent_at_three = build_tangent_line(y_equals_three, 3, .4, .001)
three_tangent_at_two = build_tangent_line(y_equals_three, 2, .4, .001)

fig = tools.make_subplots(rows=2, cols=2, subplot_titles=("f(x,-1) = -1*x^2", "f(x,1) = 1*x^2", "f(x,2) = 2*x^2", "f(x,3) = 3x^2"))

fig.append_trace(neg_one_trace, 1, 1)
fig.append_trace(neg_one_tangent_at_one, 1, 1)
fig.append_trace(neg_one_tangent_at_three, 1, 1)
fig.append_trace(neg_one_tangent_at_two, 1, 1)


fig.append_trace(one_trace, 1, 2)
fig.append_trace(one_tangent_at_one, 1, 2)
fig.append_trace(one_tangent_at_three, 1, 2)
fig.append_trace(one_tangent_at_two, 1, 2)

fig.append_trace(two_trace, 2, 1)
fig.append_trace(two_tangent_at_one, 2, 1)
fig.append_trace(two_tangent_at_three, 2, 1)
fig.append_trace(two_tangent_at_two, 2, 1)

fig.append_trace(three_trace, 2, 2)
fig.append_trace(two_trace, 2, 1)
fig.append_trace(three_tangent_at_one, 2, 2)
fig.append_trace(three_tangent_at_three, 2, 2)
fig.append_trace(three_tangent_at_two, 2, 2)



fig['layout']['yaxis1'].update(range=[-20, 0])
fig['layout']['yaxis2'].update(range=[0, 20])
fig['layout']['yaxis3'].update(range=[0, 20])
fig['layout']['yaxis4'].update(range=[0, 20])

plot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]



In [9]:
from plotly import tools
from plotly.offline import iplot, init_notebook_mode
from graph import line_function_data, line_function_trace, line_function_data, plot
init_notebook_mode(connected=True)

def x_equals_neg_one(y):
    return y*(-1**2)

def x_equals_one(y):
    return y*(1**2)

def x_equals_two(y):
    return y*(2**2)

def x_equals_three(y):
    return y*(3**2)

x_range = list(range(-6, 7))

neg_one_trace = line_function_trace(x_equals_neg_one, x_range)
one_trace = line_function_trace(x_equals_one, x_range)
two_trace = line_function_trace(x_equals_two, x_range)
three_trace = line_function_trace(x_equals_three, x_range)


neg_one_tangent_at_one = build_tangent_line(x_equals_neg_one, 1, .4, .001)
neg_one_tangent_at_three = build_tangent_line(x_equals_neg_one, 3, .4, .001)


two_tangent_at_one = build_tangent_line(x_equals_two, 1, .4, .001)
two_tangent_at_three = build_tangent_line(x_equals_two, 3, .4, .001)


one_tangent_at_one = build_tangent_line(x_equals_one, 1, .4, .001)
one_tangent_at_three = build_tangent_line(x_equals_one, 3, .4, .001)


three_tangent_at_one = build_tangent_line(x_equals_three, 1, .4, .001)
three_tangent_at_three = build_tangent_line(x_equals_three, 3, .4, .001)


fig = tools.make_subplots(rows=2, cols=2, subplot_titles=("f(-1,y) = y*-1^2", "f(1,y) = y*1^2", "f(2,y) = y*2^2", "f(3,y) = y*3^2"))

fig.append_trace(neg_one_trace, 1, 1)
fig.append_trace(neg_one_tangent_at_one, 1, 1)
fig.append_trace(neg_one_tangent_at_three, 1, 1)



fig.append_trace(one_trace, 1, 2)
fig.append_trace(one_tangent_at_one, 1, 2)
fig.append_trace(one_tangent_at_three, 1, 2)


fig.append_trace(two_trace, 2, 1)
fig.append_trace(two_tangent_at_one, 2, 1)
fig.append_trace(two_tangent_at_three, 2, 1)


fig.append_trace(three_trace, 2, 2)
fig.append_trace(two_trace, 2, 1)
fig.append_trace(three_tangent_at_one, 2, 2)
fig.append_trace(three_tangent_at_three, 2, 2)




fig['layout']['yaxis1'].update(range=[-20, 20])
fig['layout']['yaxis2'].update(range=[-20, 20])
fig['layout']['yaxis3'].update(range=[-20, 20])
fig['layout']['yaxis4'].update(range=[-20, 20])

plot(fig)

This is the format of your plot grid:
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]

