In [None]:
from intake.catalog import Catalog
import intake
import numpy as np
import pandas as pd
import holoviews as hv
from holoviews.operation.datashader import datashade

hv.extension('bokeh')

The plotting interface on intake ``DataSource`` objects attempts to mirror the pandas plotting API, but instead of plotting with matplotlib uses HoloViews to generate both static and dynamically streaming bokeh plots. To support plotting streaming data you can use this interface either in a Jupyter notebook or deploy it as a bokeh server app.

For additional information about working and plotting with HoloViews see the [User Guide](http://holoviews.org/user_guide/index.html), as we will focus on using the high-level plotting API in this overview and skip most of the mechanics going on behind the scenes.

We will be focusing on three different datasets:

* A small CSV file of US State level crime data
* A parquet file of airline data
* A streaming data source of packet capture data

In [None]:
crime_source = intake.open_csv('./CrimeStatebyState.csv')
airline_source = intake.open_parquet('./airline.parq')
pcap_source = ds = intake.open_pcap(None, interface='en0', chunksize=5)

# The plot interface

The ``DataSource.plot`` interface provides a powerful high-level API to generate complex plots. The ``.plot`` API can be called directly or used as a namespace to generate specific plot types.

## The plot method

The first two arguments to the plot API specify the names of columns to plot on the x- and y-axis respectively:

In [None]:
crime_source.plot('Year', 'Violent Crime rate')

All plot methods return HoloViews objects, therefore we can keep handles on the plot and compose them using the usual HoloViews syntax. Here we use the ``kind`` argument to overlay the 'Population' curve with 'scatter' points:

In [None]:
crime_source.plot('Year', 'Violent Crime rate') *\
crime_source.plot('Year', 'Violent Crime rate', kind='scatter', size=6)

Instead of using the ``kind`` argument to the plot call, we can also use the ``plot`` namespace, e.g. to plot a bar plot with the ``source.plot.bar`` method:

In [None]:
crime_source.plot.bar('Year', 'Violent Crime rate', rot=90)

# Plot types

### Tables

In [None]:
crime_source.plot.table(['Year', 'Population', 'Violent Crime rate'], width=400)

### Scatter

We can also color the data points by another variable, here we will color each point by the 'Year':

In [None]:
crime_source.plot.scatter('Violent Crime rate', 'Burglary rate', c='Year', cmap='viridis_r', size=6)

### Bars

In [None]:
crime_source.plot.bar('Year', 'Population', rot=90)

### Histogram

In [None]:
crime_source.plot.hist(y='Violent Crime rate', bin_range=(100, 800), bins=20)

### Area

In [None]:
crime_source.plot.area('Year', columns=['Robbery', 'Aggravated assault'], stacked=True)

### HeatMap

In [None]:
airline_source.plot.heatmap('day', 'carrier', 'depdelay', colorbar=True).aggregate(function=np.mean)

## Distributions

### KDE

In [None]:
airline_source.plot.kde('carrier', 'depdelay', alpha=0.3)\
   .select(carrier=[b'AA', b'US', b'OH']).redim.range(depdelay=(-20, 70))

### Violin Plots

In [None]:
airline_source.plot.violin('carrier', 'depdelay')\
   .select(carrier=[b'AA', b'US', b'OH']).redim.range(depdelay=(-20, 70))

### Box-Whisker Plots

In [None]:
airline_source.plot.box('carrier', 'depdelay')\
   .select(carrier=[b'AA', b'US', b'OH']).redim.range(depdelay=(-10, 70))

# Large data

The previous examples summarized the fairly large airline dataset using statistical plot types, however if we do not want to summarize the data and view all of the raw data at once we will have to use datashader to generate a fixed size image of the data. Here we plot the 'airtime' against the 'distance':

In [None]:
datashade(airline_source.plot.scatter('distance', 'airtime')).options(width=600)

# Streaming data

Given a streaming ``DataSource`` we can also generate streaming plots, to get a dynamically streaming plot we pass ``streaming=True`` as a keyword argument. Additionally two parameters control the streaming data, the ``backlog`` defines the number of samples to accumulate and display at one time with old samples being dropped once the backlog size is reached. Secondly the ``timeout`` controls how frequently the streaming source should be queried for new data:

In [None]:
table = pcap_source.plot.table(streaming=True, backlog=100, timeout=200)
table

The data won't actually start streaming until we start the callback:

In [None]:
table.callback.start()

To stop it again we can call the stop method:

In [None]:
table.callback.stop()

## Bar plots

In [None]:
bars = pcap_source.plot.bar('dst_host', 'index', streaming=True, rot=45)
bars.callback.start()
bars.map(lambda x: x.aggregate(function=np.count_nonzero).sort(), hv.Bars)

In [None]:
bars.callback.stop()

## Line plots

In [None]:
from holoviews.operation.timeseries import resample
line = pcap_source.plot('time', 'dst_port', streaming=True, timeout=100)
line.callback.start()
resample(line, function=np.count_nonzero, rule='s')

In [None]:
line.callback.stop()