### Improving Regression Lines

In the previous section we saw how after choosing the slope and y-intercept values of a regression line, we use the residual sum of squares (RSS) to distill the goodness of fit into one number.  

Now we can go beyond that to find the "best fit" regression line by doing the following:
* Choose a regression line with a guess of values for $m$ and $b$
* Calculate the RSS
* Adjust $b$ and $m$, as these are the only things that can vary in a single-variable regression line.
* Again calculate the RSS 
* Repeat this process

The regression line (that is, the values of $b$ and $m$) with our smallest RSS is our **best fit line**.

### Assessing goodness of fit

Let's see this technique in action.  For this example, let's imagine that our data looks like the following:

In [1]:
first_show = {'x': 100, 'y': 275}
second_show = {'x': 200, 'y': 300}
third_show = {'x': 400, 'y': 700}

shows = [first_show, second_show, third_show]

We again use our `build_regression_line` function.  Remember that, the function just takes an initial guess at the slope by drawing a line between the first and last points.  Here, giving us a slope of 1.833.  And from there it calculates the value of $b$.

In [2]:
from linear_equations import build_regression_line
build_regression_line()

1.4166666666666667

In [3]:
def regression_formula(x):
    return 1.417*x + 100

Let's plot this regression formula with our data to get a sense of what this looks like.  First import the necessary libraries to allow us to use `plotly` in our notebook. 

In [4]:
import plotly
from plotly.offline import init_notebook_mode, iplot
init_notebook_mode(connected=True)

Then we can use some [custom built functions](https://github.com/learn-co-curriculum/plotly-helpers/blob/master/graph.py) to plot this data.

In [33]:
from graph import trace, plot, line_function_trace

def line_function_data(line_function, x_values):
    y_values = list(map(lambda x: line_function(x), x_values))
    return {'x': x_values, 'y': y_values}

def error(point, regression_formula):
    return point['y'] - regression_formula(x)

def error_line(regression_line, point):
    y_hat = regression_line(point['x'])
    x_value = point['x']
    name = 'error at ' + str(round(x_value, 1))
    error_value = format(point['y'] - y_hat, '.1f')
    return {'x': [x_value, x_value], 'y': [point['y'], y_hat], 'mode': 'lines+text', 'marker': {'color': 'red'}, 'name': name, 'text': [error_value], 'textposition':'right'}
    
def error_lines(regression_line, points):
    return list(map(lambda point: error_line(regression_line, point), points))

scatter = trace(shows, mode = 'markers')
x_values = list(map(lambda show: show['x'], shows))
regression_plotted = line_function_trace(regression_formula, x_values)
errors = error_lines(regression_formula, shows)

plot([scatter, regression_plotted, *errors])

From there, we calculate the `residual sum of squared errors`.

In [42]:
def regression_formula(x):     
    return 1.417*x + 70

def y(x, points):
    point_at_x = list(filter(lambda point: point['x'] == x,points))[0]
    return point_at_x['y']

# calculate the squared error at a given value of x
def squared_error(x, movies):
    return (y(x, movies) - regression_formula(x))**2

def sum_of_squared_errors(points):
    squared_errors = list(map(lambda point: squared_error(point['x'], points), points))
    return sum(squared_errors)

sum_of_squared_errors(shows) # 9166.69

10852.689999999993

Ok, 9166.69.  Is that a good number? Who knows. Let's get a sense of this by plugging in different numbers for *b* and seeing what happens to the residual sum of squares.

| b        | residual sum of squared           | 
| ------------- |:-------------:| 
| 100      |9166| 
| 110      |9804 | 
| 90      |9128 | 
|80 | 9691

Now notice that while keeping our value of $m$ at 1.83, we can move towards a smaller residual sum of squares (RSS) by changing our value of $b$.  Setting $b$ to 110 produced a higher error, than at 100, so we tried moving in the other direction.  We kept moving our $b$ value lower until we set $b$ = 80, at which point our error increased from the value at 90.  So, we know that a value of $b$ between 80 and 90 produces the smallest RSS, when we set $m$ = 1.417. 

So our RSS is a function of how we change the $b$ value, and as we'll see, how we change the $m$ value.  This changing output of RSS based on a changing input of different regression lines is called our cost function.  You can see that if we plot our cost function as RSS with changing values of $b$, we get the following:

![](./cost-curve-plot.png)

So you can see visually from this that when $b = 80$ RSS is the lowest. So we start at value 100, and we can move back and forth until we get to around 80.

This technique of adjusting our values to minimize move towards a minimum value is called *gradient descent*.  Here, we *descend* along a cost curve.  When the value of our RSS no longer decreases as we change our variable, we stop.

### Summary

So our technique from the top of this lesson holds true: 

* Adjust $b$ and $m$, as these are the only things that can vary in a single-variable regression line.
* After each adjustment calculate the average squared error 
* The regression line (that is, the values of $b$ and $m$) that produces the smallest residual sum of squares for our data is the best fit line.