<a href='http://www.holoviews.org'><img src="../../assets/hv+bk.png" alt="HV+BK logos" width="40%;" align="left"/></a>
<div style="float:right;"><h2>Exercise 4: Dynamic Interactions</h2></div>

In [None]:
import numpy as np  # noqa
import pandas as pd
import holoviews as hv
from holoviews import opts, dim
from holoviews.element.tiles import OSM

hv.extension('bokeh')
opts.defaults(opts.RGB(width=600, height=600))

### Exercise 1

In [None]:
diamonds = pd.read_csv('../../data/diamonds.csv')

As should be second nature for us now, we will look at this dataframe before we start doing anything.

In [None]:
diamonds.head()

Next we will display a static plot of 'carat' vs. 'price' as we did in the first exercise, alongside a BoxWhisker plot of the distributions.

In [None]:
scatter_opts = opts.Scatter(width=600, height=400, logy=True, tools=['box_select'], 
                            color=dim('cut'), size=1.5, cmap='tab20c')
scatter = hv.Scatter(diamonds.sample(10000), 'carat', ['price', 'cut', 'clarity']).select(carat=(0, 3))
boxwhisker = hv.BoxWhisker(scatter, 'clarity', 'price')

scatter.opts(scatter_opts) + boxwhisker

By default, the <code>BoxWhisker</code> element here will statically display the whole distribution. But if you try out the "Box select" tool, you can select a subset of the Scatter points.  Can we link the boxwhisker plot to selections made on the <code>Scatter</code> plot, so that we can see distributions in that particular region of the data space? Yes, as long as we have these three things:

1. A stream that collects selection events from the <code>scatter</code> object
2. A callback that constructs a HoloViews element from the given selection and returns it
3. A DynamicMap that runs the callback each time a new selection is available

For step 1, we provide the <code>scatter</code> object as the source for a <code>Selection1D</code> stream that will provide the <code>index</code> of all the selected nodes:

In [None]:
selection = hv.streams.Selection1D(source=scatter)

For step 2, write a function that can accept the <code>index</code> values, select those values from the original dataset, and return the appropriate HoloViews element; something like:


```python
def selection_boxwhisker(index):
    selection = scatter.iloc[index] if len(index)>0 else scatter
    return ...some hv element built from the selection...
```


Here <code>selection_boxwhisker</code> should return a <code>BoxWhisker</code> element for the selection, plotting 'price' against 'clarity'. 

For step 3, define a <code>DynamicMap</code> using the <code>selection</code> stream and your custom callback and lay it out next to the <code>scatter</code> object as above.

<details><summary href="#hint1">Hint</summary>

A `DynamicMap` requires a callback function as its first argument and streams should be supplied in a list as a keyword argument.

</details>

<details><summary href="#solution1">Solution</summary>

```python

selection = hv.streams.Selection1D(source=scatter)

def selection_boxwhisker(index):
    selection = scatter.iloc[index]
    return hv.BoxWhisker(selection, 'clarity', 'price')

scatter + hv.DynamicMap(selection_boxwhisker, streams=[selection])
```
</details>

## Exercise 2: Streaming Data

Exercise 1 used HoloViews streams to collect user interaction events (selections). Here, let's use them to view data sources that themselves are updating over time.

First, let's set up a (simulated) streaming data source in form of taxi pickup locations. The code below splits the taxi dataset into chunks by hour which will be emitted one by one to emulate a live, streaming data source.

In [None]:
import time
from itertools import cycle
from holoviews.operation.datashader import datashade  # noqa

def taxi_trips_stream(source='../../data/nyc_taxi_wide.parq', frequency='H'):
    """Generate dataframes grouped by given frequency"""
    def get_group(resampler, key):
        try:
            df = resampler.get_group(key)
            df.reset_index(drop=True)
        except KeyError:
            df = pd.DataFrame()
        return df

    df = pd.read_parquet(source,
                     columns=['tpep_pickup_datetime', 'pickup_x', 'pickup_y', 'fare_amount'], engine='fastparquet')
    df = df.set_index('tpep_pickup_datetime', drop=True)
    df = df.sort_index()
    r = df.resample(frequency)
    chunks = [get_group(r, g) for g in sorted(r.groups)]
    indices = cycle(range(len(chunks)))
    while True:
        yield chunks[next(indices)]

trips = taxi_trips_stream()
example = next(trips)

As usual let's start by inspecting the data, in this case the initial chunk emitted above:

In [None]:
example.head()

To build our streaming visualization, first declare a a map tile source for a background plot, and then make a <code>Pipe</code> stream initialized with the example chunk of data already emitted:

In [None]:
tiles = OSM()
pipe = hv.streams.Pipe(example)

Then you will need to define a callback to use when declaring a <code>DynamicMap</code>. This function will need to accept a chunk of data, then return a <code>Points</code> object displaying the 'pickup_x' and 'pickup_y' coordinates and a <code>label</code> indicating the time range being covered. Something like:


```python
def hourly_points(data):
    label = '%s - %s' % (str(data.index.min()), str(data.index.max()))
    return ...some hv object using the given data...
```

Finally, use that callback and the <code>pipe</code> stream to define a <code>DynamicMap</code>, applying the datashade operation to the DynamicMap and then overlaying it on top of the <code>tiles</code>. 

**Warning**: Do not display the <code>DynamicMap</code> without applying the <code>datashade()</code> operation, or you run the risk of freezing your browser.

<details><summary href="#hint2">Hint</summary>

    To apply datashading simply call `datashade(dynamicmap)`.

</details>

You should now see a map of New York City with the taxi trips on top. Run the next cell to send events to the <code>Pipe</code> and update the plot.

In [None]:
for i in range(100):
    time.sleep(0.05)
    pipe.send(next(trips))

<details><summary href="#solution2">Solution</summary>

```python
    
pipe = hv.streams.Pipe(example)
tiles = OSM()

def hourly_points(data):
    label = '%s - %s' % (str(data.index.min()), str(data.index.max()))
    return hv.Points(data, ['pickup_x', 'pickup_y'], label=label)

points = hv.DynamicMap(hourly_points, streams=[pipe])
tiles * datashade(points).opts(opts.RGB(width=600, height=600))
```
</details>   

## Exercise 3

In the previous exercise we used the <code>Pipe</code> stream, which emits just the latest chunk. That's a good way to monitor an ongoing stream, but often you'll instead want to accumulate data over time, showing the latest chunk combined with other previous chunks.  Here we will stream data using the <code>Buffer</code> stream, which accumulates data until its length is reached. We will start by defining some options, an example dataframe, and the <code>Buffer</code> stream with a length of 1,000,000:

In [None]:
opts.defaults(
    opts.Curve(width=800, height=400, color='black', line_width=1, framewise=True), 
    opts.Scatter(color='red'))

from holoviews.operation.timeseries import resample, rolling_outlier_std  # noqa
example = next(trips)[['fare_amount']]
buffer = hv.streams.Buffer(example, length=1000000)

As before, you'll need to complete the callback function so it returns an element.  In this case, we need a <code>Curve</code> plotting the 'fare_amount' against the 'tpep_pickup_datetime', starting something like:


```python
def fare_curve(data):
    ...
```

Again as before, we need to define a <code>DynamicMap</code> that uses this callback in combination with a stream (<code>buffer</code> in this case).  Here let's assign it to a variable rather than try to show it right away:

Next, apply the <code>resample</code> operation to the DynamicMap object, with<code>rule='T'</code> and <code>function=np.sum</code> and then apply the <code>rolling_outlier_std</code> operation to the output of that. Finally display an overlay of the<code>resample</code> output and the <code>rolling_outlier_std</code> output.

<details><summary href="#hint3">Hint</summary>

Operations like <code>resample</code> and <code>rolling_outlier_std</code> can be chained, e.g.:
<br><br>

```python
resampled = resample(dmap)
outliers = rolling_outlier_std(resampled)
resampled * outliers
```

</details>

Now that you've displayed the plot, let's start sending some data to the buffer, which should start accumulating 1000000 trips:

In [None]:
for i in range(100):
    time.sleep(0.1)
    buffer.send(next(trips)[['fare_amount']])

<details><summary href="#solution3">Solution</summary>

<br>

```python
example = next(trips)[['fare_amount']]
buffer = hv.streams.Buffer(example, length=1000000)
    
def fare_curve(data):
    return hv.Curve(data, 'tpep_pickup_datetime', 'fare_amount')

fares = hv.DynamicMap(fare_curve, streams=[buffer])    
minutely = resample(fares, rule='T', function=np.sum)
minutely * rolling_outlier_std(minutely, rolling_window=10)
```

</details>