## Introduction:

#### Note: If the plots does not render locally, please use the Jupyter nbviewer [here](https://nbviewer.jupyter.org/github/skybluu/15388-tutorial/blob/master/final2.ipynb)
In this tutorial, we will show you how to plot interactive graphs in Python, specifically using [Plotly](http://plot.ly/). In contrast to its counter parts, interactive plots can not only provide much more information than static 2D plots, but they can also clearly present higher-dimension graphs.
Below is an example plot from the popular plotting framework matplotlib:
<img src="https://matplotlib.org/_images/scatter3d_demo.png">

It is very hard to pin-point specific data points or identify trends in static plots like these. By using interactive plots, however, we can solve all these problems and do much more.

### Tutorial content

To demonstrate the capabilities of a new plotting framework, we'll need data with various characteristics. For this tutorial, we'll be using data collected from the Pittsburgh 311 Data repository from the Western Pennsylvania Regional Data Center: http://data.wprdc.org/dataset. It contains data of all of the requests the Pittsburgh 311 service intakes everyday and has over 220000 entries dating from 2015 to 2018.

We will cover the following topics in this tutorial:
- [Installing the libraries](#Installing-the-libraries)
- [Loading the data](#Loading-the-data)
- [Create Simple Interactive Plots with Plotly](#Create-Simple-Interactive-Plots-with-Plotly)
- [Interactive Maps](#Interactive-Maps)

## Installing the libraries

Installing plotly is simple and is just like installing any other library in python using `pip`.
To install, simply run:
    
    $ pip install plotly --upgrade

In [150]:
import pandas as pd
import numpy as np
import scipy as sp
import plotly.plotly as py
import plotly.figure_factory as ff
import plotly.graph_objs as go
from plotly.graph_objs import *
import plotly

## Loading the data

The original data is given in csv format and we can directly load it into a pandas dataframe. To simplify the process, I have already uploaded the file to my github repo. Since plotly is a web-service, we need to have a plotly account and use the unique API key in order to use it. However, setting up an account is completely free and all the plots we generate are saved inside our Plotly accounts online. The first line below provides authentication to your Plotly account.

After we load the csv file, we can keep only the columns we want to improve performance, since the entire file is huge.
Then we can directly plot the dataframe with Plotly, all we have to do is to transform the Pandas dataframe to a Plotly table object by calling `ff.create_table()` and Plotly's `iplot()` function. The same graph is also saved to your Plotly account under the filename "table1" (specified in the `iplot()` function by `filename = XX`).

In [39]:
plotly.tools.set_credentials_file(username='username', api_key='X')

df = pd.read_csv("https://raw.githubusercontent.com/skybluu/15388-tutorial/master/data/PGH_311_data.csv")
df = df[["CREATED_ON", "REQUEST_TYPE", "REQUEST_ORIGIN", "DEPARTMENT", "NEIGHBORHOOD", "X", "Y"]]
table = ff.create_table(df[:5])
py.iplot(table, filename='table1')

## Create Simple Interactive Plots with Plotly

As with any other advanced plotting framework, Plotly already has many simple built-in plots, all we have to do is passing in the X and Y values, we'll start with a bar chart to demonstrate and at the same time explore the dataset.

To create a bar chart, we have to identify all the labels and their number of appearances in the dataset. To do this, we iterate through the entire file to count all the different `REQUEST_TYPE`.

In [40]:
unique_type = []
type_count = []
for idx, row in df.iterrows():
    _type = row['REQUEST_TYPE']
    if (_type not in unique_type):
        unique_type.append(_type)
        type_count.append(1)
    else:
        type_count[unique_type.index(_type)] += 1

data = [Bar(x=unique_type,
            y=type_count)]

Now that we have all the unique `REQUEST_TYPE` labels and their counts, we just need to feed them into the built-in `Pie()` function to create a graph object that represents a Pie chart and plots it.

In [41]:
trace = go.Pie(labels=unique_type, values=type_count, textinfo = "none")
py.iplot([trace], filename='basic_pie_chart')

This is the first interactive plot that we have created in this tutorial and we can immediately see its potential. While a static pie chart with this many labels will provide almost no information on the dataset, this interactive version manages to keep all the data and present us the exact name, count and percentage of each label. We can zoom-in/out and even select the labels to include by clicking the legend on the right. 

Now that we can see that there are just too many different kinds of requests, we'll just focus on the top 5 requests from now on.

In [151]:
new_df = df.loc[df['REQUEST_TYPE'].isin(['Potholes','Weeds/Debris', 'Building Maintenance', 'Snow/Ice removal', 'Refuse Violations'])]

The second graph we're going to plot is a timeseries on the top 5 requests we just identified from the Pie chart. 

To create a timeseries, we need to extract the number of each inquires on specific months. We first add a new column that contains the month and year data.

In [67]:
df_mon = new_df  

df_mon['CREATED_ON'] = pd.to_datetime(df_mon['CREATED_ON'])
df_mon['mnth_yr'] = df_mon['CREATED_ON'].map(lambda x: 100*x.year + x.month)

table2 = ff.create_table(df_mon[:5])
py.iplot(table2, filename='table2')

Then, using the column we've just created, we can group the different request types by month using Panda's `groupby()` function.

In [152]:
group1 = df_mon.groupby(['mnth_yr', 'REQUEST_TYPE'])['mnth_yr']
month_list = []
pot_list = []
weed_list = []
maint_list = []
snow_list = []
refuse_list = []

for name, group in group1:
    #name[0] is time, name[1] is request
    if (name[0] not in month_list):
        month_list.append(name[0])
    
month_list.sort()

for i in month_list:
    pot_list.append(0)
    weed_list.append(0)
    maint_list.append(0)
    snow_list.append(0)
    refuse_list.append(0)
    
for name, group in group1:
    idx = month_list.index(name[0])
    if (name[1] == "Potholes"):
        pot_list[idx] += group.count()
    elif (name[1] == "Weeds/Debris"):
        weed_list[idx] += group.count()
    elif (name[1] == "Building Maintenance"):
        maint_list[idx] += group.count()
    elif (name[1] == "Snow/Ice removal"):
        snow_list[idx] += group.count()
    else:
        refuse_list[idx] += group.count()


This is a much more complicated graph than the Pie chart, so we need to sort out how to put together a plot in Plotly.

First of all, there are two main parts of every plot, the `data` and `layout`.
`data` stores all of the information and is a list of all the plots you want to show on the canvas. The entries are graph objects in Plotly, such as `Pie`, `Scatter`, `Heatmap` and so on. 
In this plot, we are going to use `Scatter` to plot the timeseries of each request type as a scatter plot. Besides from taking inputs as `X` and `Y`, it can also take a number of parameters specified in the documentation, but the most important ones are `name` and `hoverinfo`, these two specifies what we will see if we hover our mouse on the datapoint. A clear name of each datapoint is very important for interactive graphs.

The `layout` object specifies the layout of the graph. In this case, it manages the margins and labels/titles of the graph.

In [95]:
title = 'Number of Requests'
labels = ['Potholes', 'Weeds/Debris', 'Building Maintenance', 'Snow/Ice removal', 'Refuse Violations']
colors = ['rgba(49,130,189, 1)', 'rgba(115,115,115,1)', 'rgba(67,67,67,1)', 'rgba(150,150,150,1)', 'rgba(189,189,189,1)']
mode_size = [12, 8, 8, 8, 8]
line_size = [4, 2, 2, 2, 2]
traces = []

month_list = ['April-15', 'May-15', 'June-15', 'July-15',
         'August-15', 'September-15', 'October-15', 'November-15', 'December-15','January-16', 'February-16', 'March-16', 'April-16', 'May-16', 'June-16', 'July-16',
         'August-16', 'September-16', 'October-16', 'November-16', 'December-16','January-17', 'February-17', 'March-17', 'April-17', 'May-17', 'June-17', 'July-17',
         'August-17', 'September-17', 'October-17', 'November-17', 'December-17','January-18', 'February-18', 'March-18']
#potholes
traces.append(go.Scatter(
    x=month_list, y=pot_list, mode='lines',
    line=dict(color=colors[0], width=line_size[0]),
    connectgaps=True,
    name="Potholes",
    hoverinfo=name
))

traces.append(go.Scatter(
    x=[month_list[0], month_list[-1]],
    y=[pot_list[0], pot_list[-1]],
    mode='markers',
    marker=dict(color=colors[0], size=mode_size[0])
))

#weed
traces.append(go.Scatter(
    x=month_list, y=weed_list, mode='lines',
    line=dict(color=colors[1], width=line_size[1]),
    connectgaps=True,
    name="Weeds/Debris",
    hoverinfo=name
))

traces.append(go.Scatter(
    x=[month_list[0], month_list[-1]],
    y=[weed_list[0], weed_list[-1]],
    mode='markers',
    marker=dict(color=colors[1], size=mode_size[1])
))

#maint_list = []
traces.append(go.Scatter(
    x=month_list, y=maint_list, mode='lines',
    line=dict(color=colors[2], width=line_size[2]),
    connectgaps=True,
    name="Building Maintenance",
    hoverinfo=name
))

traces.append(go.Scatter(
    x=[month_list[0], month_list[-1]],
    y=[maint_list[0], maint_list[-1]],
    mode='markers',
    marker=dict(color=colors[2], size=mode_size[2])
))

#snow_list = []
traces.append(go.Scatter(
    x=month_list, y=snow_list, mode='lines',
    line=dict(color=colors[3], width=line_size[3]),
    connectgaps=True,
    name="Snow/Ice removal",
    hoverinfo=name
))

traces.append(go.Scatter(
    x=[month_list[0], month_list[-1]],
    y=[snow_list[0], snow_list[-1]], mode='markers',
    marker=dict(color=colors[3], size=mode_size[3])
))

#refuse_list = []
traces.append(go.Scatter(
    x=month_list,y=refuse_list,mode='lines',
    line=dict(color=colors[4], width=line_size[4]),
    connectgaps=True,
    name="Refuse Violations",
    hoverinfo=name
))

traces.append(go.Scatter(
    x=[month_list[0], month_list[-1]],
    y=[refuse_list[0], refuse_list[-1]],
    mode='markers',
    marker=dict(color=colors[4], size=mode_size[4])
))

layout = go.Layout(
    legend=dict(
        y=0.5,
        traceorder='reversed',
        font=dict(
            size=16
        )
    ),
    xaxis=dict(
        showline=True,showgrid=False,showticklabels=True,
        linecolor='rgb(204, 204, 204)',
        linewidth=2,
        autotick=False,
        ticks='outside',
        tickcolor='rgb(204, 204, 204)',
        tickwidth=2,ticklen=5,tickangle=30,
        tickfont=dict(
            family='Arial',
            size=12,
            color='rgb(82, 82, 82)',
        ),
    ),
    yaxis=dict(
        showgrid=False,zeroline=False,showline=False,showticklabels=False,
    ),
    autosize=False,
    margin=dict(
        autoexpand=False,l=100,r=20,t=110,
    ),
    showlegend=False,
)

annotations = []

# Adding labels
for y_trace, label, color in zip([pot_list, weed_list, maint_list, snow_list, refuse_list], labels, colors):
    # labeling the left_side of the plot
    annotations.append(dict(xref='paper', x=0.05, y=y_trace[0],
                                  xanchor='right', yanchor='middle',
                                  text=label + ' {} Requests'.format(y_trace[0]),
                                  font=dict(family='Arial',
                                            size=10,
                                            color=colors,),
                                  showarrow=False))
    # labeling the right_side of the plot
    annotations.append(dict(xref='paper', x=0.95, y=y_trace[11],
                                  xanchor='left', yanchor='middle',
                                  text='{} Requests'.format(y_trace[11]),
                                  font=dict(family='Arial',
                                            size=10,
                                            color=colors,),
                                  showarrow=False))
# Title
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1.05,
                              xanchor='left', yanchor='bottom',
                              text= title,
                              font=dict(family='Arial',
                                        size=30,
                                        color='rgb(37,37,37)'),
                              showarrow=False))
# Source
annotations.append(dict(xref='paper', yref='paper', x=0.0, y=1,
                              xanchor='left', yanchor='bottom',
                              text='Source: Official PGH 311 Data',
                              font=dict(family='Arial',
                                        size=12,
                                        color='rgb(150,150,150)'),
                              showarrow=False))

layout['annotations'] = annotations

fig = go.Figure(data=traces, layout=layout)
py.iplot(fig, filename='news-source')

Like the pie chart, while this is not a pretty graph, we can still obtain much information from it. For exmaple, each type of request has a distinct trend throughout the year. For example, Pothole requests rise during Janurary to March and falls later on while Weed/Debris requests peak during the summer. Also, we can notice a huge spike of Pothole requests just last month!

Another advantage of interactive plots is that they can demonstrate higher dimension data easily and really set themselves aside from 2D, static plots. However, there isn't any interesting 3D plots we can plot from the 311 Dataset, so we'll just demonstrate by plotting a cool plot in 3D with the formula of `z = x*y^3 - y*x^3`
This time we'll use the `Scatter3d` graph object which takes in `X`, `Y`, `Z` and creates a scatter plot in 3D.

In [113]:
#Cool 3D plot!
x=[]
y=[]
z=[]
for i in range(20):
    x_t = -1+i/10
    for j in range(20):
        y_t = -1+j/10
        x.append(x_t)
        y.append(y_t)
        z.append(x_t*y_t**3-y_t*x_t**3)

trace_3d = go.Scatter3d(
    x=x, y=y, z=z,
    mode='markers',
    marker=dict(
        size = 6,
        color = z,
        colorscale = 'Viridis',
        opacity=0.8)
)

layout = go.Layout(margin=dict(l=0, r=0, b=0, t=0))

fig = go.Figure(data=[trace_3d], layout=layout)
py.iplot(fig, filename='basic-3d')

## Interactive Maps
After getting to know Plotly and all of its capabilities, it is now time to get serious and plot higher-level interactive graphs. In this example we will plot geographic data on a map using latitude and longitude data.

As stated before, `data` is a list of plots we want to draw on the canvas. In this case, we need to create 5 different scatter plots on the map, one for each type of request. To do this, we need to first separate the data into 5 sets of lists.

Also, since we have too many data points in the data, for this part we'll only be using the datapoints from 2017.

In [139]:
pot_lat = []
pot_lon = []
pot_ts = []
weed_lat = []
weed_lon = []
weed_ts = []
maint_lat = []
maint_lon = []
maint_ts = []
snow_lat = []
snow_lon = []
snow_ts = []
ref_lat = []
ref_lon = []
ref_ts = []

#data only from 2017
for idx, row in new_df.iterrows():
    if (row['mnth_yr'] > 201700 and row['mnth_yr'] < 201800):
        if (row['REQUEST_TYPE'] == "Potholes"):
            pot_lat.append(row['Y'])
            pot_lon.append(row['X'])
            pot_ts.append(row['mnth_yr'])
        elif (row['REQUEST_TYPE'] == "Weeds/Debris"):
            weed_lat.append(row['Y'])
            weed_lon.append(row['X'])
        elif (row['REQUEST_TYPE'] == "Building Maintenance"):
            maint_lat.append(row['Y'])
            maint_lon.append(row['X'])
        elif (row['REQUEST_TYPE'] == "Snow/Ice removal"):
            snow_lat.append(row['Y'])
            snow_lon.append(row['X'])
        elif (row['REQUEST_TYPE'] == "Refuse Violations"):
            ref_lat.append(row['Y'])
            ref_lon.append(row['X'])

After separating the datapoints, we can create the graph objects in `data`. In this case, we'll use the `Scattermapbox`. This graph takes in the latitude and lontitude values and plots them on a map provided by Mapbox. This is easier than plotting on other libraries since it is already integrated with Plotly. However, we'll need an API key from Mapbox. (It's also free to use)

Aside from the latitude and longitude values, we also need to pay attention to the `marker` parameter. This field takes in a dictionary and defines the style of the point we plot on the graph. We can change its color, size, shape etc. to differentiate between different types of datapoint.

In [143]:
mapbox_access_token = "X"

pot_trace = Scattermapbox(
        lat=pot_lat, lon=pot_lon, mode='markers',
        marker=Marker(
            size=6,
            color="red",
            symbol="circle"
        ),
        text=pot_ts,
        name = "Pothole",
        hoverinfo = name
    )

weed_trace = Scattermapbox(
        lat=weed_lat, lon=weed_lon, mode='markers',
        marker=Marker(
            size=6,
            color="green",
            symbol="circle"
        ),
        text=weed_ts,
        name = "Weeds/Debris",
        hoverinfo = name
    )

maint_trace = Scattermapbox(
        lat=maint_lat, lon=maint_lon, mode='markers',
        marker=Marker(
            size=6, color="blue", symbol="circle"
        ),
        text=maint_ts,
        name = "Building Maintenance",
        hoverinfo = name
    )

snow_trace = Scattermapbox(
        lat=snow_lat, lon=snow_lon, mode='markers',
        marker=Marker(
            size=6, color="yellow", symbol="circle"
        ),
        text=snow_ts,
        name = "Snow/Ice removal",
        hoverinfo = name
    )

ref_trace = Scattermapbox(
        lat=ref_lat, lon=ref_lon, mode='markers',
        marker=Marker(
            size=6, color="purple", symbol="circle"
        ),
        text=ref_ts,
        name = "Refuse Violations",
        hoverinfo = name
    )
data = [pot_trace, weed_trace, maint_trace, snow_trace, ref_trace]

layout = Layout(
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=40.44, lon=-79.99 ),
        pitch=0, zoom=11 ),
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='Montreal Mapbox')

This is already a very informative plot and we can pin point every incident. However, we can still improve the plot in terms of interactivity. For example, plotting everything at once may make it hard to see and differentiate between the dots, let's add some buttons the select exactly what kind of requests we want to see.

The buttons can serve many functions in Plotly, here we want it to `update` the graph so that different pieces of the `data` gets plotted while others are not.

The buttons are implemented via a field in `layout` that creates a panel of buttons and each button decides which part of `data` to show. In the code below we set the fields in `'visible'` to be `True` only on the data we want to show.

In [153]:
updatemenus = list([
    dict(type="buttons",
         active=-1,
         buttons=list([   
            dict(label = 'Potholes',
                 method = 'update',
                 args = [{'visible': [True, False, False, False, False]},
                         {'title': 'Potholes Only',
                          'annotations': []}]),
            dict(label = 'Weeds/Debris',
                 method = 'update',
                 args = [{'visible': [False, True, False, False, False]},
                         {'title': 'Weeds/Debris Only',
                          'annotations': []}]),
            dict(label = 'Building Maintenance',
                 method = 'update',
                 args = [{'visible': [False, False, True, False, False]},
                         {'title': 'Building Maintenance Only',
                          'annotations': []}]),
            dict(label = 'Snow/Ice removal',
                 method = 'update',
                 args = [{'visible': [False, False, False, True, False]},
                         {'title': 'Snow/Ice removal Only',
                          'annotations': []}]),
            dict(label = 'Refuse Violations',
                 method = 'update',
                 args = [{'visible': [False, False, False, False, True]},
                         {'title': 'Refuse Violations Only',
                          'annotations': []}]),
            dict(label = 'All',
                 method = 'update',
                 args = [{'visible': [True, True, True, True, True]},
                         {'title': 'All Requests',
                          'annotations': []}])]),
    )
])


layout = Layout(
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token,
        bearing=0,
        center=dict(
            lat=40.44,
            lon=-79.99
        ),
        pitch=0,
        zoom=11
    ),
    updatemenus = updatemenus
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='update')

Lastly, rather than differentiate between different types of requests, we can create a slider that presents the datapoints only during a specific timeframe. In this case we divide the Pothole requests in 2017 by the month and use a slider to choose the month we want to see.

A slider is like a button since that it also iterates through different objects in `data` and chooses one to show on the plot. First, we create 12 `Scattermapbox` objects in `data` and set the `visible` field to False. Then, we use the slider to choose which one to show.

In [148]:
lat_month = []
lon_month = []
for i in range(12):
    lat_month.append([])
    lon_month.append([])

for idx, row in new_df.iterrows():
    if (row['mnth_yr'] > 201700 and row['mnth_yr'] < 201800):
        if (row['REQUEST_TYPE'] == "Potholes"):
            month = row['mnth_yr']%100
            lat_month[month-1].append(row['Y'])
            lon_month[month-1].append(row['X'])
     
data = [Scattermapbox(
        visible = False, lat=lat_month[month], lon=lon_month[month], mode='markers',
        marker=Marker(
            size=6, color="red", symbol="circle" ),
        name = "Pothole in month "+str(month+1),
        hoverinfo = name
    ) for month in range(0, 12)]

steps = []

for i in range(len(data)):
    step = dict(
        method = "restyle",
        args = ['visible', [False]*len(data)]
    )
    step['args'][1][i] = True
    steps.append(step)
    
sliders = [dict(
        active = 10,
        currentvalue = {"prefix": "Month: "},
        pad = {"t": 12}, steps = steps)]

layout = Layout(
    autosize=True,
    hovermode='closest',
    mapbox=dict(
        accesstoken=mapbox_access_token, bearing=0,
        center=dict(
            lat=40.44, lon=-79.99),
        pitch=0, zoom=11
    ),
    sliders = sliders
)

fig = dict(data=data, layout=layout)
py.iplot(fig, filename='slider')


## Summary and references

This tutorial gave a brief introduction to the Plotly framework and demonstrated some of its capabilities by a simple exploration of the PGH 311 dataset. There are many more possibilities with Plotly as a Web API as well as a handy local plotting tool.

The Plotly documentation has everything you need to know about Plotly:
https://plot.ly/python/reference/