# Intro to regression lines

### Learning Objectives

* Understand how regression lines can help us make predictions about data
* Understand how the components of slope and y-intercept determine the output of a regression line 
* Understand how to represent a line as a function

### A not so simple problem

Now we know a bit about plotting data and we have written some functions, like `trace_values`, `layout`, and `plot`, to help us do so.  You can [view them here](https://github.com/learn-co-curriculum/single-variable-regression/blob/master/lib/graph.py).

### The benefit of a buck

Imagine we are hired as a consultant for a movie executive.  The movie executive receives a budget proposal, and wants to know how much money the movie might make.  We can help him by building a model of the relationship between the money spent on a movie and money made. 

### Representing linear regression graphically

To predict movie revenue based on a budget, let's draw a single straight line that represents the relationship between how much a movie costs and how much it makes.  

> Eventually, we will want to **train** this model to match up against an actual data, but for now let's just draw a line to see how it can make estimates.

In [11]:
from lib.graph import trace_values, plot, layout

regression_trace = trace_values([0, 150], [0, 450], mode = 'lines', name = 'estimated revenue')
movie_layout = layout(options = {'title': 'Movie Spending and Revenue (in millions)'})
plot([regression_trace], [movie_layout)

ValueError: 
    Invalid value of type 'builtins.list' received for the 'layout' property of 
        Received value: [{'title': 'Movie Spending and Revenue (in millions)'}]

    The 'layout' property is an instance of Layout
    that may be specified as:
      - An instance of plotly.graph_objs.Layout
      - A dict of string/value properties that will be passed
        to the Layout constructor

        Supported dict properties:
            
            angularaxis
                plotly.graph_objects.layout.AngularAxis
                instance or dict with compatible properties
            annotations
                A tuple of
                plotly.graph_objects.layout.Annotation
                instances or dicts with compatible properties
            annotationdefaults
                When used in a template (as
                layout.template.layout.annotationdefaults),
                sets the default property values to use for
                elements of layout.annotations
            autosize
                Determines whether or not a layout width or
                height that has been left undefined by the user
                is initialized on each relayout. Note that,
                regardless of this attribute, an undefined
                layout width or height is always initialized on
                the first call to plot.
            bargap
                Sets the gap (in plot fraction) between bars of
                adjacent location coordinates.
            bargroupgap
                Sets the gap (in plot fraction) between bars of
                the same location coordinate.
            barmode
                Determines how bars at the same location
                coordinate are displayed on the graph. With
                "stack", the bars are stacked on top of one
                another With "relative", the bars are stacked
                on top of one another, with negative values
                below the axis, positive values above With
                "group", the bars are plotted next to one
                another centered around the shared location.
                With "overlay", the bars are plotted over one
                another, you might need to an "opacity" to see
                multiple bars.
            barnorm
                Sets the normalization for bar traces on the
                graph. With "fraction", the value of each bar
                is divided by the sum of all values at that
                location coordinate. "percent" is the same but
                multiplied by 100 to show percentages.
            boxgap
                Sets the gap (in plot fraction) between boxes
                of adjacent location coordinates. Has no effect
                on traces that have "width" set.
            boxgroupgap
                Sets the gap (in plot fraction) between boxes
                of the same location coordinate. Has no effect
                on traces that have "width" set.
            boxmode
                Determines how boxes at the same location
                coordinate are displayed on the graph. If
                "group", the boxes are plotted next to one
                another centered around the shared location. If
                "overlay", the boxes are plotted over one
                another, you might need to set "opacity" to see
                them multiple boxes. Has no effect on traces
                that have "width" set.
            calendar
                Sets the default calendar system to use for
                interpreting and displaying dates throughout
                the plot.
            clickmode
                Determines the mode of single click
                interactions. "event" is the default value and
                emits the `plotly_click` event. In addition
                this mode emits the `plotly_selected` event in
                drag modes "lasso" and "select", but with no
                event data attached (kept for compatibility
                reasons). The "select" flag enables selecting
                single data points via click. This mode also
                supports persistent selections, meaning that
                pressing Shift while clicking, adds to /
                subtracts from an existing selection. "select"
                with `hovermode`: "x" can be confusing,
                consider explicitly setting `hovermode`:
                "closest" when using this feature. Selection
                events are sent accordingly as long as "event"
                flag is set as well. When the "event" flag is
                missing, `plotly_click` and `plotly_selected`
                events are not fired.
            coloraxis
                plotly.graph_objects.layout.Coloraxis instance
                or dict with compatible properties
            colorscale
                plotly.graph_objects.layout.Colorscale instance
                or dict with compatible properties
            colorway
                Sets the default trace colors.
            datarevision
                If provided, a changed value tells
                `Plotly.react` that one or more data arrays has
                changed. This way you can modify arrays in-
                place rather than making a complete new copy
                for an incremental change. If NOT provided,
                `Plotly.react` assumes that data arrays are
                being treated as immutable, thus any data array
                with a different identity from its predecessor
                contains new data.
            direction
                Legacy polar charts are deprecated! Please
                switch to "polar" subplots. Sets the direction
                corresponding to positive angles in legacy
                polar charts.
            dragmode
                Determines the mode of drag interactions.
                "select" and "lasso" apply only to scatter
                traces with markers or text. "orbit" and
                "turntable" apply only to 3D scenes.
            editrevision
                Controls persistence of user-driven changes in
                `editable: true` configuration, other than
                trace names and axis titles. Defaults to
                `layout.uirevision`.
            extendfunnelareacolors
                If `true`, the funnelarea slice colors (whether
                given by `funnelareacolorway` or inherited from
                `colorway`) will be extended to three times its
                original length by first repeating every color
                20% lighter then each color 20% darker. This is
                intended to reduce the likelihood of reusing
                the same color when you have many slices, but
                you can set `false` to disable. Colors provided
                in the trace, using `marker.colors`, are never
                extended.
            extendpiecolors
                If `true`, the pie slice colors (whether given
                by `piecolorway` or inherited from `colorway`)
                will be extended to three times its original
                length by first repeating every color 20%
                lighter then each color 20% darker. This is
                intended to reduce the likelihood of reusing
                the same color when you have many slices, but
                you can set `false` to disable. Colors provided
                in the trace, using `marker.colors`, are never
                extended.
            extendsunburstcolors
                If `true`, the sunburst slice colors (whether
                given by `sunburstcolorway` or inherited from
                `colorway`) will be extended to three times its
                original length by first repeating every color
                20% lighter then each color 20% darker. This is
                intended to reduce the likelihood of reusing
                the same color when you have many slices, but
                you can set `false` to disable. Colors provided
                in the trace, using `marker.colors`, are never
                extended.
            font
                Sets the global font. Note that fonts used in
                traces and other layout components inherit from
                the global font.
            funnelareacolorway
                Sets the default funnelarea slice colors.
                Defaults to the main `colorway` used for trace
                colors. If you specify a new list here it can
                still be extended with lighter and darker
                colors, see `extendfunnelareacolors`.
            funnelgap
                Sets the gap (in plot fraction) between bars of
                adjacent location coordinates.
            funnelgroupgap
                Sets the gap (in plot fraction) between bars of
                the same location coordinate.
            funnelmode
                Determines how bars at the same location
                coordinate are displayed on the graph. With
                "stack", the bars are stacked on top of one
                another With "group", the bars are plotted next
                to one another centered around the shared
                location. With "overlay", the bars are plotted
                over one another, you might need to an
                "opacity" to see multiple bars.
            geo
                plotly.graph_objects.layout.Geo instance or
                dict with compatible properties
            grid
                plotly.graph_objects.layout.Grid instance or
                dict with compatible properties
            height
                Sets the plot's height (in px).
            hiddenlabels
                hiddenlabels is the funnelarea & pie chart
                analog of visible:'legendonly' but it can
                contain many labels, and can simultaneously
                hide slices from several pies/funnelarea charts
            hiddenlabelssrc
                Sets the source reference on plot.ly for
                hiddenlabels .
            hidesources
                Determines whether or not a text link citing
                the data source is placed at the bottom-right
                cored of the figure. Has only an effect only on
                graphs that have been generated via forked
                graphs from the plotly service (at
                https://plot.ly or on-premise).
            hoverdistance
                Sets the default distance (in pixels) to look
                for data to add hover labels (-1 means no
                cutoff, 0 means no looking for data). This is
                only a real distance for hovering on point-like
                objects, like scatter points. For area-like
                objects (bars, scatter fills, etc) hovering is
                on inside the area and off outside, but these
                objects will not supersede hover on point-like
                objects in case of conflict.
            hoverlabel
                plotly.graph_objects.layout.Hoverlabel instance
                or dict with compatible properties
            hovermode
                Determines the mode of hover interactions. If
                `clickmode` includes the "select" flag,
                `hovermode` defaults to "closest". If
                `clickmode` lacks the "select" flag, it
                defaults to "x" or "y" (depending on the
                trace's `orientation` value) for plots based on
                cartesian coordinates. For anything else the
                default value is "closest".
            images
                A tuple of plotly.graph_objects.layout.Image
                instances or dicts with compatible properties
            imagedefaults
                When used in a template (as
                layout.template.layout.imagedefaults), sets the
                default property values to use for elements of
                layout.images
            legend
                plotly.graph_objects.layout.Legend instance or
                dict with compatible properties
            mapbox
                plotly.graph_objects.layout.Mapbox instance or
                dict with compatible properties
            margin
                plotly.graph_objects.layout.Margin instance or
                dict with compatible properties
            meta
                Assigns extra meta information that can be used
                in various `text` attributes. Attributes such
                as the graph, axis and colorbar `title.text`,
                annotation `text` `trace.name` in legend items,
                `rangeselector`, `updatemenues` and `sliders`
                `label` text all support `meta`. One can access
                `meta` fields using template strings:
                `%{meta[i]}` where `i` is the index of the
                `meta` item in question. `meta` can also be an
                object for example `{key: value}` which can be
                accessed %{meta[key]}.
            metasrc
                Sets the source reference on plot.ly for  meta
                .
            modebar
                plotly.graph_objects.layout.Modebar instance or
                dict with compatible properties
            orientation
                Legacy polar charts are deprecated! Please
                switch to "polar" subplots. Rotates the entire
                polar by the given angle in legacy polar
                charts.
            paper_bgcolor
                Sets the color of paper where the graph is
                drawn.
            piecolorway
                Sets the default pie slice colors. Defaults to
                the main `colorway` used for trace colors. If
                you specify a new list here it can still be
                extended with lighter and darker colors, see
                `extendpiecolors`.
            plot_bgcolor
                Sets the color of plotting area in-between x
                and y axes.
            polar
                plotly.graph_objects.layout.Polar instance or
                dict with compatible properties
            radialaxis
                plotly.graph_objects.layout.RadialAxis instance
                or dict with compatible properties
            scene
                plotly.graph_objects.layout.Scene instance or
                dict with compatible properties
            selectdirection
                When "dragmode" is set to "select", this limits
                the selection of the drag to horizontal,
                vertical or diagonal. "h" only allows
                horizontal selection, "v" only vertical, "d"
                only diagonal and "any" sets no limit.
            selectionrevision
                Controls persistence of user-driven changes in
                selected points from all traces.
            separators
                Sets the decimal and thousand separators. For
                example, *. * puts a '.' before decimals and a
                space between thousands. In English locales,
                dflt is ".," but other locales may alter this
                default.
            shapes
                A tuple of plotly.graph_objects.layout.Shape
                instances or dicts with compatible properties
            shapedefaults
                When used in a template (as
                layout.template.layout.shapedefaults), sets the
                default property values to use for elements of
                layout.shapes
            showlegend
                Determines whether or not a legend is drawn.
                Default is `true` if there is a trace to show
                and any of these: a) Two or more traces would
                by default be shown in the legend. b) One pie
                trace is shown in the legend. c) One trace is
                explicitly given with `showlegend: true`.
            sliders
                A tuple of plotly.graph_objects.layout.Slider
                instances or dicts with compatible properties
            sliderdefaults
                When used in a template (as
                layout.template.layout.sliderdefaults), sets
                the default property values to use for elements
                of layout.sliders
            spikedistance
                Sets the default distance (in pixels) to look
                for data to draw spikelines to (-1 means no
                cutoff, 0 means no looking for data). As with
                hoverdistance, distance does not apply to area-
                like objects. In addition, some objects can be
                hovered on but will not generate spikelines,
                such as scatter fills.
            sunburstcolorway
                Sets the default sunburst slice colors.
                Defaults to the main `colorway` used for trace
                colors. If you specify a new list here it can
                still be extended with lighter and darker
                colors, see `extendsunburstcolors`.
            template
                Default attributes to be applied to the plot.
                This should be a dict with format: `{'layout':
                layoutTemplate, 'data': {trace_type:
                [traceTemplate, ...], ...}}` where
                `layoutTemplate` is a dict matching the
                structure of `figure.layout` and
                `traceTemplate` is a dict matching the
                structure of the trace with type `trace_type`
                (e.g. 'scatter'). Alternatively, this may be
                specified as an instance of
                plotly.graph_objs.layout.Template.  Trace
                templates are applied cyclically to traces of
                each type. Container arrays (eg `annotations`)
                have special handling: An object ending in
                `defaults` (eg `annotationdefaults`) is applied
                to each array item. But if an item has a
                `templateitemname` key we look in the template
                array for an item with matching `name` and
                apply that instead. If no matching `name` is
                found we mark the item invisible. Any named
                template item not referenced is appended to the
                end of the array, so this can be used to add a
                watermark annotation or a logo image, for
                example. To omit one of these items on the
                plot, make an item with matching
                `templateitemname` and `visible: false`.
            ternary
                plotly.graph_objects.layout.Ternary instance or
                dict with compatible properties
            title
                plotly.graph_objects.layout.Title instance or
                dict with compatible properties
            titlefont
                Deprecated: Please use layout.title.font
                instead. Sets the title font. Note that the
                title's font used to be customized by the now
                deprecated `titlefont` attribute.
            transition
                Sets transition options used during
                Plotly.react updates.
            uirevision
                Used to allow user interactions with the plot
                to persist after `Plotly.react` calls that are
                unaware of these interactions. If `uirevision`
                is omitted, or if it is given and it changed
                from the previous `Plotly.react` call, the
                exact new figure is used. If `uirevision` is
                truthy and did NOT change, any attribute that
                has been affected by user interactions and did
                not receive a different value in the new figure
                will keep the interaction value.
                `layout.uirevision` attribute serves as the
                default for `uirevision` attributes in various
                sub-containers. For finer control you can set
                these sub-attributes directly. For example, if
                your app separately controls the data on the x
                and y axes you might set
                `xaxis.uirevision=*time*` and
                `yaxis.uirevision=*cost*`. Then if only the y
                data is changed, you can update
                `yaxis.uirevision=*quantity*` and the y axis
                range will reset but the x axis range will
                retain any user-driven zoom.
            updatemenus
                A tuple of
                plotly.graph_objects.layout.Updatemenu
                instances or dicts with compatible properties
            updatemenudefaults
                When used in a template (as
                layout.template.layout.updatemenudefaults),
                sets the default property values to use for
                elements of layout.updatemenus
            violingap
                Sets the gap (in plot fraction) between violins
                of adjacent location coordinates. Has no effect
                on traces that have "width" set.
            violingroupgap
                Sets the gap (in plot fraction) between violins
                of the same location coordinate. Has no effect
                on traces that have "width" set.
            violinmode
                Determines how violins at the same location
                coordinate are displayed on the graph. If
                "group", the violins are plotted next to one
                another centered around the shared location. If
                "overlay", the violins are plotted over one
                another, you might need to set "opacity" to see
                them multiple violins. Has no effect on traces
                that have "width" set.
            waterfallgap
                Sets the gap (in plot fraction) between bars of
                adjacent location coordinates.
            waterfallgroupgap
                Sets the gap (in plot fraction) between bars of
                the same location coordinate.
            waterfallmode
                Determines how bars at the same location
                coordinate are displayed on the graph. With
                "group", the bars are plotted next to one
                another centered around the shared location.
                With "overlay", the bars are plotted over one
                another, you might need to an "opacity" to see
                multiple bars.
            width
                Sets the plot's width (in px).
            xaxis
                plotly.graph_objects.layout.XAxis instance or
                dict with compatible properties
            yaxis
                plotly.graph_objects.layout.YAxis instance or
                dict with compatible properties


By using a line, we can see how much money is earned for any point on this line.  All we need to do is look at a given $x$ value, and find the corresponding $y$ value at that point on the line. 

* Spend 60 million, and expect to bring in about 180 million.  
* Spend 20 million, and expect to bring in 60 million.  

This approach of modeling a linear relationship (that is a drawing a straight line) between an input and an output is called **linear regression**.  We call the input our **explanatory variable**, and the output the **dependent variable**.  So here, we are saying budget *explains* our dependent variable, revenue.

### Representing linear regression with functions

Instead of only representing this line visually, we also would like to represent this line with a function. That way, instead of having to **see** how an $x$ value points to a $y$ value along our line, we simply could punch this input into our function to calculate the proper output.  

#### A wrong guess
Let's take an initial (wrong) guess at turning this line into a function.  

**First**, we represent the line as a mathematical formula.

$y = x$

**Then**, we turn this formula into a function:

In [3]:
def y(x):
    return x

y(0)

0

In [4]:
y(10000000)

10000000

This is pretty nice.  We just wrote a function that automatically calculates the expected revenue given a certain movie budget.  This function says that for every value of $x$ that we input to the function, we get back an equal value $y$.  So according to the function, if the movie has a budget of $30$ million, it will earn $30$ million. 

#### A better guess: Matching lines to functions
Take a look at the line that we drew.  Our line says something different.  The line says that spending 30 million brings predicted earnings of 90 million.  We need to change our function so that it matches our line.  In fact, we need a consistent way to turn lines into functions, and vice versa.  Let's get to it.

**We start** by turning our line into a chart below.  It shows how our line relates x-values and y-values, or our budgets and revenues.

| X (budget)       | Y (revenue)           | 
| ------------- |:-------------:| 
| 0      |0 | 
| 30 million      |90 million | 
| 60 million      |180 million | 

**Next**, we need an equation that allows us to match this data.
* input 0 and get back 0
* input 30 million and get back 90 million
* and input 60 million and get back 180 million?  

What equation is that?  Well it's $y = 3x$.  Take a look to see for yourself.

* 0 * 3 = 0
* 30 million * 3 = 90 million
* 60 million * 3 = 180 million 

Let's see it in the code.  This is what it looks like:

In [5]:
def y(x):
    return 3*x

In [6]:
y(30000000)

90000000

In [7]:
y(0)

0

Progress! We multiplied each $x$ value by 3 so that our function's outputs correspond to the $y$ values appearing along our graphed line.

### The Slope Variable 

By multiplying $x$ by 3, we just altered the **slope variable**.  The slope variable changes the inclination of the line in our graph.  Slope generally is represented by $m$ like so:

$y = mx$ 

Let's make sure we understand what all of our variables stand for.  Here they are: 

* $y$: the output value returned by the function, also called the **response variable**, as it responds to values of $x$
* $x$: the input variable, also called the **explanatory variable**, as it explains the value of $y$
* $m$: the **slope variable**, determines how vertical or horizontal the line will appear

Let's adapt these terms to our movie example.  The $y$ value is the revenue earned from the movie, which we say is in *response* to our budget.  The *explanatory variable* of our budget, $x$, represents our budget, and the $m$ corresponds to our value of 3, which describes how much money is earned for each dollar spent.  Therefore, with an $m$ of 3, our line says to expect to earn 3 dollars for each dollar spent making the movie.  Likewise, an $m$ of 2 suggests we earn 2 dollars for every dollar we spend.

> A higher value of $m$ means a steeper line.  It also means that we expect more money earned per dollar spent on our movies.  Imagine the line pivoting to a steeper tilt as we guess a higher amount of money earned per dollar spent.  

### The y-intercept

There is one more thing that we need to learn in order to describe every straight line in a two-dimensional world.  That is the **y-intercept**.

> * The **y-intercept** is the $y$ value of the line where it intersects the y-axis.  
> * Or, put another way, the y-intercept is the value of $y$ when $x$ equals zero.  

Let's add a trace with a higher y-intercept than our initial line to the movie plot.  

In [8]:
regression_trace_increased = trace_values([0, 150], [50, 500], mode = 'lines', name = 'increased est. revenue')
plot([regression_trace_increased, regression_trace], movie_layout)

What is the y-intercept of the original estimated revenue line?  Well, it's the value of $y$ when that line crosses the y-axis.  That value is zero.  Our second line is parallel to the first but is shifted higher so that the y-intercept increases up to 50 million.  Here, for every value of $x$, the corresponding value of $y$ is higher by 50 million.  

* Our formula is no longer $y = 3x$.  
* It is $y = 3 x + 50,000,000$. 

In addition to determining the y-intercept from a line on a graph, we can also see the y-intercept by looking at a chart of points.  

> In the chart below, we know that the y-intercept is 50 million because its corresponding $x$ value is zero. 

| X        | Y           | 
| ------------- |:-------------:| 
| 0      |50 million | 
| 40 million      |170 million | 
| 60 million      |230 million | 

The y-intercept of a line usually is represented by *b*.  Now we have all of the information needed to describe any straight line using the formula below:  

$$y = mx + b $$

Once more, in this formula: 
* $m$ is our slope of the line, and
* $b$ is the value of $y$ when $x$ equals zero.  

So thinking about it visually, increasing $m$ makes the line steeper, and increasing $b$ pushes the line higher.   

In the context of our movies, we said that the the line with values of $m$ = 3 and $b$ = 50 million describes our line, giving us:

$y = 3x + 50,000,000 $.

Let's translate this into a function.  For any input of $x$ our function returns the value of $y$ along that line.  

In [None]:
def y(x):
    return 3*x + 50000000

In [None]:
y(30000000)

In [None]:
y(60000000)

### Summary

In this section, we saw how to estimate the relationship between an input variable and an output value.  We did so by drawing a straight line representing the relationship between a movie's budget and it's revenue.  We saw the output for a given input simply by looking at the y-value of the line at that input point of $x$.  

We then learned how to represent a line as a mathematical formula, and ultimately a function.  We describe lines through the formula $y = mx + b $, with $m$ representing the slope of the line, and $b$ representing the value of $y$ when $x$ equals zero.  The $b$ variable shifts the line up or down while the $m$ variable tilts the line forwards or backwards.  Translating this formula into a function, we can write a function that returns an expected value of $y$ for an input value of $x$.