<center>
# <img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/PyViz_logo_wm.png" width=170>Revealing your data (nearly) effortlessly,<br>at every step in your workflow
<br>
<img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/hv_gv_bk_ds_pa.png" width="50%" style="margin: 0px 25%">
<center>

You can use this document either as a talk, using [RISE](https://github.com/damianavila/RISE), or as a regular Jupyter notebook. 

# Workflow from data to decision
<img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/workflow.png" width=40% align="left" style="margin: 0px 20px">
<br>
If there's no visualization at any of these stages, you're flying blind.<br><br>

What if if were simple to visualize anything, anywhere?

<img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/landscape_hv_nx.png" width=65% align="left" style="margin: 0px 30px">
## Good news:<br><br>Lots of choices!
<br><br><br><small>(adapted from Jake VanderPlas)

<img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/landscape_hv_nx.png" width=65% align="left" style="margin: 0px 30px">

## Bad news:<br><br>Lots of choices!
<br><br>
Too hard to
try them all,
learn them all, or 
get them to work together.

<img src="https://github.com/pyviz/pyviz/raw/master/notebooks/assets/landscape_hv_nx_pyviz.png" width=65% align="left" style="margin: 0px 30px">

## PyViz:
<br><br>
Seamless interoperability<br>for browser-based<br>viz tools

Supported by Anaconda, Inc.

In [None]:
import numpy as np
import pandas as pd
import holoviews as hv
import datashader.transfer_functions as tf

%matplotlib inline

hv.extension('bokeh', 'matplotlib', width="100")
%opts Curve [width=600 height=250 tools=['hover'] ] {+framewise} VLine (color="black")
%opts Bars  [width=800 height=400 tools=['hover'] group_index=1 legend_position='top_left' xrotation=90]

# Exploring Pandas Dataframe

If your data is in a Pandas dataframe, it's natural to explore it using the ``.plot()`` method (based on Matplotlib).  Let's look at a [dataset of the number of cases of measles and pertussis](http://graphics.wsj.com/infectious-df-and-vaccines/#b02g20t20w15) (per 100,000 people) over time in each state:

In [None]:
df = pd.read_csv('../data/diseases.csv.gz')
df.head()

Just calling ``.plot()`` won't give anything meaningful, because it doesn't know what should be plotted against what:

In [None]:
df.plot();

But with some Pandas operations we can pull out parts of the data that make sense to plot:

In [None]:
import numpy as np
measles_by_year = df[["Year","measles"]].groupby("Year").aggregate(np.sum)
measles_by_year.plot();

Here it is easy to see that the 1963 introduction of a measles vaccine brought the cases down to negligible levels.

By default, the tools below ignore the Pandas index, so we'll make it into a real column for the rest of this notebook:

In [None]:
measles_by_year = measles_by_year.reset_index()

# Exploring Data with HoloViews and Bokeh

The above plots are just static images, but it can be just as simple to get fully interactive plots in a web browser, with hover, pan, and zoom, by using HoloViews to get a Bokeh plot:

In [None]:
hv.Curve(measles_by_year)

Here [Bokeh](http://bokeh.pydata.org) makes rich JavaScript-based interactive plots from within Python, accessed using the simple data-centric API of [HoloViews](http://holoviews.org).

With HoloViews, you can easily add to the plot to capture your understanding as you explore:

In [None]:
m = hv.Curve(measles_by_year)  *  hv.VLine(1963)  *  \
    hv.Text(1963, 27000, " Vaccine introduced", halign='left')
m
print(m)

while still always being able to access the original data involved for further analysis:

In [None]:
m.Curve.I.data.head()

With other plotting libraries, each plot you make will be a dead end, discouraging you from investing in it, but HoloViews objects preserve the full data throughout plotting, slicing, sampling, and other operations.

It's also easy to break down the data in different ways, such as to look at each state individually:

In [None]:
ds = hv.Dataset(df, ['Year', 'State'], 'measles').aggregate(function=np.sum)
measles_by_state = ds.to(hv.Curve, 'Year', 'measles')
measles_by_state * hv.VLine(1963)

Or pull out a couple of those to put side by side:

In [None]:

measles_by_state["Texas"] + measles_by_state["New York"]

Or to compare four states over time by overlaying:

In [None]:
states = ['New York', 'New Jersey', 'California', 'Texas']
measles_by_state.select(State=states, Year=(1930, 2005)).overlay() * hv.VLine(1963)

Or by faceting:

In [None]:
%%opts Curve [width=200, height=100]
measles_by_state.select(State=states, Year=(1930, 2005)).grid('State') * hv.VLine(1963)

Or as Bars or many other types of plots:

In [None]:
ds.select(State=states, Year=(1980, 1990)).to(hv.Bars, ['Year', 'State'], 'measles').sort()

Or with error bars:

In [None]:
agg = ds.aggregate('Year', function=np.mean, spreadfn=np.std)
(hv.Curve(agg) * hv.ErrorBars(agg,vdims=['measles', 'measles_std'])).redim.range(measles=(0, None)) * hv.VLine(1963)

If we really want to invest a lot of time in making a fancy plot, we can customize it to try to show *all* the yearly data about measles at once:

In [None]:
url = 'https://raw.githubusercontent.com/blmoore/blogR/master/data/measles_incidence.csv'
data = pd.read_csv(url, skiprows=2, na_values='-')

yearly_data = data.drop('WEEK', axis=1).groupby('YEAR').sum().reset_index()
measles = pd.melt(yearly_data, id_vars=['YEAR'], var_name='State', value_name='Incidence')

heatmap = hv.HeatMap(measles, label='Measles Incidence')
aggregate = hv.Dataset(heatmap).aggregate('YEAR', np.mean, np.std)

marker = hv.Text(1963, 800, u'\u2193 Vaccine introduced', halign='left')

agg = hv.ErrorBars(aggregate) * hv.Curve(aggregate).opts(plot=dict(xrotation=90))

hm_opts = dict(width=900, height=500, tools=['hover'], logz=True, invert_yaxis=True,
               xrotation=90, labelled=[], toolbar='above', xaxis=None)
overlay_opts = dict(width=900, height=200, show_title=False)
vline_opts = dict(line_color='black')

opts = {'HeatMap': {'plot':  hm_opts}, 
        'Overlay': {'plot':  overlay_opts}, 
        'VLine':   {'style': vline_opts}}

In [None]:
(heatmap + agg * marker).opts(opts).cols(1)

By the way, the only thing about any of this that's specific to Bokeh is being able to interact with elements of the plot; HoloViews can use Matplotlib instead of Bokeh to generate any of the plots if we don't need zoom, hover, etc.:

In [None]:
%%output backend='matplotlib' 
measles_by_state * hv.VLine(1963) * hv.Text(1963, 1000, "  Vaccine introduced", halign='left')

As you can see, there are lots of options for getting quick plots to explore your data in a browser, and if you choose HoloView+Bokeh plots, you can have full interactivity with very little code to explore even quite complex datasets.

# Interactive statistical plots

For high-dimensional datasets with additional data variables, we can compose all the above faceting methods as needed.

For instance, let's look at the Iris dataset:

In [None]:
from holoviews.operation import gridmatrix
from bokeh.sampledata.iris import flowers as iris

iris.tail()

We can look at all these relationships at once, interactively:<span style="display:block; margin-top:-12px;"> </span>

In [None]:
%%opts Bivariate [bandwidth=0.5] (cmap=Cycle(values=['Blues', 'Reds', 'Oranges'])) 
%%opts Points    [tools=['box_select','lasso_select']] (size=2 alpha=0.7)  NdOverlay [batched=False]
iris_ds      = hv.Dataset(iris).groupby('species').overlay()
density_grid = gridmatrix(iris_ds, diagonal_type=hv.Distribution, chart_type=hv.Bivariate)
point_grid   = gridmatrix(iris_ds, chart_type=hv.Points)
density_grid * point_grid

# Dealing with large data and geo data

PyViz is a modular suite of tools, and when you need capabilities not handled by Bokeh and HoloViews as above, you can bring those in:
 
* [**GeoViews**](http://geo.holoviews.org): Visualizable geographic HoloViews objects
* [**Datashader**](http://datashader.org): Rasterizing huge HoloViews objects to images quickly
* [**Param**](https://ioam.github.io/param): Declaring user-relevant parameters, making it simple to work with widgets inside and outside of a notebook context
- [**Colorcet**](http://bokeh.github.io/colorcet): perceptually uniform colormaps for big data

Let's look at a large(ish) dataset of 10 million taxi trips on a map.

In [None]:
import holoviews as hv, geoviews as gv, dask.dataframe as dd, cartopy.crs as crs
from colorcet import fire
from holoviews.operation.datashader import datashade

df = dd.read_parquet('../data/nyc_taxi_wide.parq').persist()
options = dict(width=700, height=600, xaxis=None, yaxis=None, bgcolor='black')
points = hv.Points(df, ['pickup_x', 'pickup_y'])
taxi_trips = datashade(points, x_sampling=0.5, y_sampling=0.5, cmap=fire).opts(plot=options)
url = 'https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'
tiles = gv.WMTS(url, crs=crs.GOOGLE_MERCATOR)
tiles * taxi_trips

As you can see, you can specify geo plots easily with GeoViews, and if your HoloViews objects are too big to visualize in a browser directly, you can add `datashade()` to render them into images dynamically on zooming, etc.

You can also easily add widgets to control filtering, selection, and other options interactively, either here in the notebook or in a standalone server:

In [None]:
import param, parambokeh
from colorcet import cm_n
from holoviews.streams import RangeXY

url='https://server.arcgisonline.com/ArcGIS/rest/services/World_Imagery/MapServer/tile/{Z}/{Y}/{X}.jpg'
tiles = gv.WMTS(url,crs=crs.GOOGLE_MERCATOR)
opts = dict(width=1000,height=600,xaxis=None,yaxis=None,bgcolor='black',show_grid=False)

class NYCTaxiExplorer(hv.streams.Stream):
    alpha      = param.Magnitude(default=0.75, doc="Alpha value for the map opacity")
    colormap   = param.ObjectSelector(default=cm_n["fire"], objects=cm_n.values())
    location   = param.ObjectSelector(default='dropoff', objects=['dropoff', 'pickup'])

    def make_view(self, x_range, y_range, **kwargs):
        map_tiles = tiles.options(alpha=self.alpha, **opts)
        points = hv.Points(df, [self.location+'_x', self.location+'_y'])
        taxi_trips = datashade(points, x_sampling=0.5, y_sampling=0.5, cmap=self.colormap,
                               dynamic=False, x_range=x_range, y_range=y_range, width=1000, height=600)
        return map_tiles * taxi_trips

In [None]:
explorer = NYCTaxiExplorer(name="NYC Taxi Trips")
parambokeh.Widgets(explorer, callback=explorer.event)
hv.DynamicMap(explorer.make_view, streams=[explorer, RangeXY()])

As you can see, the PyViz tools let you integrate visualization into everything you do, using a small amount of code that reveals your data's properties and captures your understanding of it.

Check out [pyviz.org](http://pyviz.org) for a detailed tutorial covering all of this material, and see each of the individual project sites for a wealth of examples and galleries.

This notebook and set of slides are archived at: https://anaconda.org/jbednar/bednar_index_2017/

Thanks to all the PyViz contributors, including James Bednar, Philipp Rudiger, Jean-Luc Stevens, Bryan Van de Ven, Mateusz Paprocki, Joseph Crail, Greg Brener, and Chris Ball.