# Visualizing Data with Graphs

### Introduction

In the previous section, we introduced all of the basic Python tools: datatypes, variables, data collections like lists and dictionaries, functions, loops, and iterators.  We will use these skills throughout our data science career.

Let's step back and take a look from the macro perspective.  The machine learning process is generally as follows:

* Gather and clean the data 
* Study the data
* Select a model 
* Train: Optimize the model for some other criteria (eg. how well the model predicts our known labeled data)
* Use the model predict on new data

The tools we learned in the previous section will help us gather and clean data.  We touched on studying the data with visualizations using the Plotly library in the previous section, but now it's time to take a deeper dive into exploring data with visualizations.

### Learning Objectives

* Understand the components of a point in a graph, an $x$ value, and a $y$ value 
* Understand where to place a point on a graph, from knowing a point's $x$ and $y$ value
* Get a sense of how to use a graphing library, like Plotly, to answer questions about our data

### A common problem

Imagine that Molly is selling cupcakes out of her kitchen.  She gains more and more customers, so she decides to hire a delivery person, Bob.  Molly asks us to calculate which customers are closest to and furthest from Bob.  This way, she can pay him appropriately.

Molly gives us a list of all of the customer locations, along with Bob's.  Here they are:

| Name | Avenue #| Block # | 
|------|------| ------     |
| Bob    | 4  |     8     | 
| Suzie  | 1  |     11     | 
| Fred   | 5  |     8     | 
| Edgar  | 6  |     13     | 
| Steven | 3  |     6     | 
| Natalie| 5  |     4     | 

Now to determine the person closest to Bob you decide to make a graph of each customer's locations, as well as Bob's, in a graph.

### Visualizing Data with Graphs

Before plotting everyone's locations, let's start off with a scatter plot of just one random point, the point $(2, 1)$.

![](./plot-one-point.png)

Ok so that graph above uses the cartesian coordinate system.  The coordinate system is used to display data along both an x-axis and y-axis.  The **x-axis** runs horizontally, from left to right, and you can see it as the labeled gray line along the bottom.  The **y-axis** runs vertically, from the bottom to the top.  You can see it labeled on the far left of our graph.

In the graph above, it shows the x-axis starting at -4 and the y-axis starting at -1, but that's just to make things easy to see.  In reality, you can imagine the x-axis and y-axis both including all numbers from negative infinity to positive infinity.  And that blue marker in top right portion of our graph represents the point where $x = 2 $ and $y = 1$.  Do you see why?  Well it's the place where the $x$ value is $2$, and the $y$ value is $1$.  As a shorthand, we mathematicians express this point as $(2, 1)$.  So the format is $(x, y) $, with the $x$ coordinate always coming first.

The light-gray lines form a grid on the graph to help us see where any given **point** is on a graph.  A point in geometry just means a location.  Now, test your knowledge by moving your mouse to the point $(4, 2)$.  Did you get it?  It's the spot at the top right of the graph.

### Plotting our data

Ok, now let's plot the data given.  


| Name | Avenue #| Block # | 
|------|------| ------     |
| Bob    | 4  |     8     | 
| Suzie  | 1  |     11     | 
| Fred   | 5  |     8     | 
| Edgar  | 6  |     13     | 
| Steven | 3  |     6     | 
| Natalie| 5  |     4     | 


We cannot graph the data with python itself, so we need to download a library from the Internet.  This is easy enough.  Simply go to your terminal and type in `pip install plotly` followed by the enter key.  Or you can press shift enter on the cell below.  If you already have `plotly` installed, you will see a message saying that it's already installed -- which you can safely ignore.

In [1]:
!pip install plotly

[33mYou are using pip version 10.0.1, however version 19.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


Now we have `plotly` on our computer.  The next step is to apply it to this notebook.  We do so with the following two lines.

In [2]:
import plotly

plotly.offline.init_notebook_mode(connected=True)
# use offline mode to avoid initial registration

We bring in the `plotly` library by using the keyword `import` followed by our library name, `plotly`.  We create a new dictionary in python with the `dict` constructor.  Then we pass through **named arguments** to the constructor to create a dictionary with an $x$ key that points to a list of $x$ values.  Similarly, we create a $y$ key with a value of a list of $y$ values. Note that the $x$ values match avenue numbers and the $y$ values match the block numbers.  We display this data by assigning our dictionary to the variable of `trace0`, and passing it through as an argument to the `plotly.offline.iplot` method.  

In [3]:
import plotly
plotly.offline.init_notebook_mode(connected=True)
# we repeat these first lines just to keep the code together  

trace0 = dict(x=[4, 1, 5, 6, 3, 2], y=[8, 11, 8, 13, 6, 4])

# All that, and it doesn't even look good :(
plotly.offline.iplot([trace0])

The points were plotted correctly, but they are connected by a line, which doesn't represent anything in particular.

The lines are getting in the way.  Let's remove all of the connecting lines by setting `mode = "markers"`.  Then, let's also set labels to each of the dots, by setting `text` equal to a list of our names.  

In [9]:
trace1 = dict(x=[4, 1, 5, 6, 3, 2],
              y=[8, 11, 8, 13, 6, 4], 
              mode="markers", 
              kind = "scatter",
              text=["bob", "suzie", "fred", "edgar", "steven", "natalie"],)


plotly.offline.iplot([trace1])

# much better :)

ValueError: Invalid property specified for object of type plotly.graph_objs.Scatter: 'kind'

    Valid properties:
        cliponaxis
            Determines whether or not markers and text nodes are
            clipped about the subplot axes. To show markers and
            text nodes above axis lines and tick labels, make sure
            to set `xaxis.layer` and `yaxis.layer` to *below
            traces*.
        connectgaps
            Determines whether or not gaps (i.e. {nan} or missing
            values) in the provided data arrays are connected.
        customdata
            Assigns extra data each datum. This may be useful when
            listening to hover, click and selection events. Note
            that, *scatter* traces also appends customdata items in
            the markers DOM elements
        customdatasrc
            Sets the source reference on plot.ly for  customdata .
        dx
            Sets the x coordinate step. See `x0` for more info.
        dy
            Sets the y coordinate step. See `y0` for more info.
        error_x
            plotly.graph_objs.scatter.ErrorX instance or dict with
            compatible properties
        error_y
            plotly.graph_objs.scatter.ErrorY instance or dict with
            compatible properties
        fill
            Sets the area to fill with a solid color. Use with
            `fillcolor` if not *none*. *tozerox* and *tozeroy* fill
            to x=0 and y=0 respectively. *tonextx* and *tonexty*
            fill between the endpoints of this trace and the
            endpoints of the trace before it, connecting those
            endpoints with straight lines (to make a stacked area
            graph); if there is no trace before it, they behave
            like *tozerox* and *tozeroy*. *toself* connects the
            endpoints of the trace (or each segment of the trace if
            it has gaps) into a closed shape. *tonext* fills the
            space between two traces if one completely encloses the
            other (eg consecutive contour lines), and behaves like
            *toself* if there is no trace before it. *tonext*
            should not be used if one trace does not enclose the
            other.
        fillcolor
            Sets the fill color. Defaults to a half-transparent
            variant of the line color, marker color, or marker line
            color, whichever is available.
        hoverinfo
            Determines which trace information appear on hover. If
            `none` or `skip` are set, no information is displayed
            upon hovering. But, if `none` is set, click and hover
            events are still fired.
        hoverinfosrc
            Sets the source reference on plot.ly for  hoverinfo .
        hoverlabel
            plotly.graph_objs.scatter.Hoverlabel instance or dict
            with compatible properties
        hoveron
            Do the hover effects highlight individual points
            (markers or line points) or do they highlight filled
            regions? If the fill is *toself* or *tonext* and there
            are no markers or text, then the default is *fills*,
            otherwise it is *points*.
        hovertext
            Sets hover text elements associated with each (x,y)
            pair. If a single string, the same string appears over
            all the data points. If an array of string, the items
            are mapped in order to the this trace's (x,y)
            coordinates. To be seen, trace `hoverinfo` must contain
            a *text* flag.
        hovertextsrc
            Sets the source reference on plot.ly for  hovertext .
        ids
            Assigns id labels to each datum. These ids for object
            constancy of data points during animation. Should be an
            array of strings, not numbers or any other type.
        idssrc
            Sets the source reference on plot.ly for  ids .
        legendgroup
            Sets the legend group for this trace. Traces part of
            the same legend group hide/show at the same time when
            toggling legend items.
        line
            plotly.graph_objs.scatter.Line instance or dict with
            compatible properties
        marker
            plotly.graph_objs.scatter.Marker instance or dict with
            compatible properties
        mode
            Determines the drawing mode for this scatter trace. If
            the provided `mode` includes *text* then the `text`
            elements appear at the coordinates. Otherwise, the
            `text` elements appear on hover. If there are less than
            20 points, then the default is *lines+markers*.
            Otherwise, *lines*.
        name
            Sets the trace name. The trace name appear as the
            legend item and on hover.
        opacity
            Sets the opacity of the trace.
        r
            For legacy polar chart only.Please switch to
            *scatterpolar* trace type.Sets the radial coordinates.
        rsrc
            Sets the source reference on plot.ly for  r .
        selected
            plotly.graph_objs.scatter.Selected instance or dict
            with compatible properties
        selectedpoints
            Array containing integer indices of selected points.
            Has an effect only for traces that support selections.
            Note that an empty array means an empty selection where
            the `unselected` are turned on for all points, whereas,
            any other non-array values means no selection all where
            the `selected` and `unselected` styles have no effect.
        showlegend
            Determines whether or not an item corresponding to this
            trace is shown in the legend.
        stream
            plotly.graph_objs.scatter.Stream instance or dict with
            compatible properties
        t
            For legacy polar chart only.Please switch to
            *scatterpolar* trace type.Sets the angular coordinates.
        text
            Sets text elements associated with each (x,y) pair. If
            a single string, the same string appears over all the
            data points. If an array of string, the items are
            mapped in order to the this trace's (x,y) coordinates.
            If trace `hoverinfo` contains a *text* flag and
            *hovertext* is not set, these elements will be seen in
            the hover labels.
        textfont
            Sets the text font.
        textposition
            Sets the positions of the `text` elements with respects
            to the (x,y) coordinates.
        textpositionsrc
            Sets the source reference on plot.ly for  textposition
            .
        textsrc
            Sets the source reference on plot.ly for  text .
        tsrc
            Sets the source reference on plot.ly for  t .
        uid

        unselected
            plotly.graph_objs.scatter.Unselected instance or dict
            with compatible properties
        visible
            Determines whether or not this trace is visible. If
            *legendonly*, the trace is not drawn, but can appear as
            a legend item (provided that the legend itself is
            visible).
        x
            Sets the x coordinates.
        x0
            Alternate to `x`. Builds a linear space of x
            coordinates. Use with `dx` where `x0` is the starting
            coordinate and `dx` the step.
        xaxis
            Sets a reference between this trace's x coordinates and
            a 2D cartesian x axis. If *x* (the default value), the
            x coordinates refer to `layout.xaxis`. If *x2*, the x
            coordinates refer to `layout.xaxis2`, and so on.
        xcalendar
            Sets the calendar system to use with `x` date data.
        xsrc
            Sets the source reference on plot.ly for  x .
        y
            Sets the y coordinates.
        y0
            Alternate to `y`. Builds a linear space of y
            coordinates. Use with `dy` where `y0` is the starting
            coordinate and `dy` the step.
        yaxis
            Sets a reference between this trace's y coordinates and
            a 2D cartesian y axis. If *y* (the default value), the
            y coordinates refer to `layout.yaxis`. If *y2*, the y
            coordinates refer to `layout.yaxis2`, and so on.
        ycalendar
            Sets the calendar system to use with `y` date data.
        ysrc
            Sets the source reference on plot.ly for  y .
        

Ok, so if you move your mouse over the dots, you can see the names that correspond to each point.  Also, when we hover over the dot at the x axis of point four, we can see that is Bob's point, just like it should be.  Now, who is closest to Bob?  It looks like Fred is closest since he's only one avenue away. Fred seems to be the easiest delivery for Bob.

### Summary

In this section, we saw how we use data visualizations to better understand the data.  A cartesian coordinate system nicely represents two dimensional data.  It allows us to represent a point's $x$ value by placing the point horizontally at the correct spot on the x-axis.  It represents a point's $y$ value by placing the point at the correct spot along the y-axis.

To display the data with `plotly` we need to do a couple of things.  First, we install plotly by going to our terminal and running `pip install plotly`.  Then to use the library, we import the `plotly` library into our notebook.  Once the library is loaded in our notebook, it's time to use it.  We create a new dictionary with keys of $x$ and $y$, with each key pointing to a list of the $x$ or $y$ values of our points.  To clean up the appearance we set the `mode` attribute equal to `'markers'`.