Lesson 15: A First Look at Interactive Visualization
====================
---
Prof. James Sharpnack<br>
Statistics Department, UC Davis<br>
&copy; 2018

The following taxonomy of interactive dynamics for visualization is from [Interactive Dynamics for
Visual Analysis](https://queue.acm.org/detail.cfm?id=2146416) by Heer and Schneiderman.  It will assist us when thinking about the different tools that we will see in mpld3.  D3 is a much more extensive library in javascript for making interactive visualizations that work in the browser.

### Data & View Specification 
- **Visualize** data by choosing visual encodings.
- **Filter** out data to focus on relevant items.
- **Sort** items to expose patterns.
- **Derive** values or models from source data.

Many of these functions are delegated to matplotlib.  For example, most of our visualizations come about from the chart typology of pyplot.  In mpld3, the figure and axes objects are serialized and sent to D3 for interactive visualization.  Many of the other processes (sorting, deriving, and filtering) are accomplished through query widgets, where html form boxes will be used to modify the plot.

### View Manipulation 
- **Select** items to highlight, filter, or manipulate them.
- **Navigate** to examine high-level patterns and low-level detail.
- **Coordinate views** for linked, multi-dimensional exploration.
- **Organize** multiple windows and workspaces.

Selection is one of the most important tools that we will use in interactive visualization.  In mpld3 we will mostly use plugins that display information about elements when you hover over them.  Navigation and filtering are mostly accomplished in mpld3 by zooming and brushing.  D3 can help you link different plots to the same data, and that will help you coordinate views and organize your plots.  We only see that functionality in mpld3 with the brushing plugin.

### Process & Provenance 
- **Record analysis** histories for revisitation, review and sharing.
- **Annotate patterns** to document findings.
- **Share** views and annotations to enable collaboration.
- **Guide users** through analysis tasks or stories.

Most of these are delegated to the jupyter notebook for us.

In [2]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import mpld3
from mpld3 import plugins
import matplotlib.dates as mdates

In [3]:
TUIT = pd.read_csv('data/TUIT.csv')
intuitnames = [cname for cname in TUIT.columns.values if 'TUITIONFEE_IN2' in cname]
outtuitnames = [cname for cname in TUIT.columns.values if 'TUITIONFEE_OUT2' in cname]
tuityears = np.arange(2001,2016)
TUIT = TUIT.set_index('INSTNM')

In [4]:
fig, ax = plt.subplots(figsize = (8,6))

for instnm, row in TUIT[intuitnames].iterrows():
    l = ax.plot(tuityears,pd.to_numeric(row),alpha=.2,label=instnm)[0]
    linelabs = plugins.LineLabelTooltip(l,instnm)
    plugins.connect(fig, linelabs)
    
ax.set_ylabel('dollars per year')
ax.set_title('In-state tuition')
mpld3.display(fig)

In [5]:
fig, (ax1, ax2) = plt.subplots(1,2, sharey = True, figsize=(12,5))

for instnm, row in TUIT[intuitnames].iterrows():
    l = ax1.plot(tuityears,pd.to_numeric(row),'k',alpha=.1,label=instnm)[0]
    linelabs = plugins.LineLabelTooltip(l,instnm)
    plugins.connect(fig, linelabs)

for instnm, row in TUIT[outtuitnames].iterrows():
    l = ax2.plot(tuityears,pd.to_numeric(row),'k',alpha=.1,label=instnm)[0]
    linelabs = plugins.LineLabelTooltip(l,instnm)
    plugins.connect(fig, linelabs)

ax1.set_ylabel('dollars per year')
ax1.set_title('In-state tuition')
ax2.set_title('Out-state tuition')
mpld3.display(fig)

In [6]:
EARN = pd.read_csv('data/EARN.csv')

In [7]:
fig, ax = plt.subplots(figsize=(10,8))

labels = list(EARN['INSTNM'])
scat = ax.scatter(EARN.iloc[:,2],EARN.iloc[:,3],s=EARN.iloc[:,1]/500,
            c='b',edgecolors='w')
ax.set_xlim([0,50000])
ax.set_xlabel('In-state tuition fees (USD/year)')
ax.set_ylabel('Mean earnings after 10 yrs (USD/year)')
ax.set_title('2014: Mean earnings as a function of tuition')

pointlabs = plugins.PointLabelTooltip(scat,labels)
plugins.connect(fig, pointlabs)

ax.collections

mpld3.display(fig)

In [8]:
flights = pd.read_csv('data/flight_red.tsv',sep='\t',header=None)

In [9]:
colnames = ['Origin','Dest','Origin City','Dest City',
        'Passengers','Seats','Flights','Distance','Date','Origin Pop','Dest Pop']
flights.columns = colnames
flights['Date'] = flights['Date'].astype(str)
flights['Date'] = pd.to_datetime(flights['Date'],format="%Y%m")
flights = flights.set_index('Date')
flights = flights.to_period(freq='M')
# months = flights[['Flights','Passengers']].groupby(level=0).sum()

In [10]:
fig, ax = plt.subplots(figsize=(10,8))
codes = ['SFO','BOS','JFK','SEA','IAH']
for APC in codes:
    months = flights[flights['Origin']==APC].groupby(level=0).sum()
    months.plot(y='Passengers', ax=ax, legend=False)
    
interactive_legend = plugins.InteractiveLegendPlugin(ax.get_lines(),codes)
plugins.connect(fig, interactive_legend)
plt.subplots_adjust(right=.8)
ax.set_ylabel('Passengers')
ax.set_title('Monthly Passenger Counts Leaving Airports')
mpld3.display(fig)