## Example of custom callback using HoloViews linked streams

**Authors:** Matias C. Kind, Wil O'Mullane, Angelo Fausti

**Stack Version:** w_2018_22

**Image Size:** Small

**Purpose:**  Demonstrates the Holoviews+Datashader+Bokeh stack for data exploration in particular the  [linked streams](http://holoviews.org/user_guide/Custom_Interactivity.html) functionality.

In [None]:
import os
from lsst.daf.persistence import Butler
import pandas as pd
import holoviews as hv
import numpy as np
from holoviews.operation.datashader import datashade
import param
from holoviews import streams

hv.extension('bokeh')

We'll use results from the [HSC RC2 reprocessing](https://confluence.lsstcorp.org/display/DM/Reprocessing+of+the+HSC+RC2+dataset).

In [None]:
datadir = '/datasets/hsc/repo/rerun/RC/w_2018_15/DM-14123/'
butler = Butler(datadir)

Which dataset types are available? which filters, tracts, and patches can we use? 

Currently, it is not possible to query the data repository for coadd metadata. For now we have to know in advance the dataIds available using information from the link above. Butler Gen3 will fix that.

In [None]:
# dataids = butler.subset('deepCoadd_forced_src', dataId={'filter': 'HSC-G', 'tract': 9813 }).cache

Let's read the `deepCoadd_forced_src` dataset type for the `HSC-G` filter for one tract and one patch. We'll use pandas dataframes which plays well with HoloViews. 



In [None]:
_filter = 'HSC-G'
tract = 9813
patch = '7,3'

df = butler.get('deepCoadd_forced_src', 
                dataId={'tract': tract, 'patch': patch, 'filter': _filter}).asAstropy().to_pandas()

# Get only a few columns of interest
df = df[['coord_ra', 'coord_dec', 'modelfit_CModel_flux']]



In [None]:
df['modelfit_CModel_flux'].describe()

Get rid of `NaN` values and negative fluxes. 

In [None]:
df = df.dropna()
df = df.loc[df['modelfit_CModel_flux'] > 0]

In [None]:
print('Total objects in this patch = {}'.format(df.size))

By creating a HoloViews dataset we annotate the dataset for visualization declaring which columns correspond to independent variables (called key dimensions or kdims) and dependent variables (called value dimensions or vdims). 



In [None]:
kdims = [('coord_ra', 'RA(deg)'), ('coord_dec', 'Dec(deg)')]
vdims = [('modelfit_CModel_flux', 'CModel Flux')]
ds = hv.Dataset(df, kdims, vdims)
ds

In order to visualize a HoloViews dataset we can select one of the [Holoviews elements](http://holoviews.org/reference/index.html#elements). Perhaps the most obvious visualization for this dataset is to display the spatial distribution of the objects using the Point element. 

In [None]:
%%opts Points [tools=['box_select']]
points = hv.Points(ds).options({'Points': {'size': 1, 'alpha': 0.5}})
points

`hv.Points` works well with datasets size (`N<~50k`). Later we'll load a tract worth of data and we'll need another approach for visualization, called datashader.

For this dataset, datashader is not really useful. 

In [None]:
%%opts Points [tools=['box_select']]
image = datashade(hv.Points(ds))
image

Notice, however, that the `box_select` tool is not available when using datashader. That's because a selection on the image representation cannot return the index of the data points anymore. That means we'll have to use another strategy for selecting data when using datashader later.


Creating a histrogram is a little more involving, apparently it is not possible to map directly the value dimension onto the Histogram element, and reuse the label annotations for this variable yet.

In [None]:
frequencies, edges = np.histogram(ds.data['modelfit_CModel_flux'], 20)
hist = hv.Histogram((np.log(frequencies), edges))
hist

Now we can use the [linked streams](http://holoviews.org/user_guide/Custom_Interactivity.html) functionality to link both plots. 

In [None]:
listing = ', '.join(sorted([str(s.name) for s in param.descendents(streams.LinkedStream)]))
listing

There are streams to access plotting selections made using box- and lasso-select tools (`Selection1D`), the plot size (`PlotSize`) and the `Bounds` of a selection. This is really exciting, we can select regions on the spatial plot above and that will produce an histogram for the selected objects.

With `hv.Points` we can use the`Selection1D` linked stream and declare the `points` instance as the source of selection. 

In [None]:
selection = streams.Selection1D(source=points)
print('The %s stream has contents %r' % (selection, selection.contents))

We have this custom callback function that will create an histogram based on the selected data.

In [None]:
def update_histogram(index):
    # this will initialize the histogram with data in case there's no selection
    if len(index) > 0:
        selected_flux = ds.data.iloc[index]['modelfit_CModel_flux']
    else:
        selected_flux = ds.data['modelfit_CModel_flux']
    
    frequencies, edges = np.histogram(selected_flux, 20)
    
    hist = hv.Histogram((np.log(frequencies), edges))
    return hist
    

Finally we associate the selection stream instance to the callback function. This is done using a `DynamicMap` and we have both plots linked.


In [None]:
%%opts Points [tools=['box_select']]
dmap = hv.DynamicMap(update_histogram, streams=[selection])
points + dmap

The following cells are useful for debugging:

In [None]:
# selection.contents

In [None]:
#index = selection.contents['index']

#frequencies, edges = np.histogram(ds.data.iloc[index]['modelfit_CModel_flux'], 20)
#hist = hv.Histogram((frequencies, edges))
#hist

Now, lets load a tract worth of data to make the datashader example more interesting.

In [None]:
_filter = 'HSC-G'
tract = 9813

Because of the limitations mentioned bove in the butler implementation, we'll have to look at the data output repository to get the list of the patches available. 

Then we can loop over the dataIds and load the corresponding catalogs.

In [None]:
patches = [x[1] for x in os.walk("{}/deepCoadd-results/{}/{}".format(datadir, _filter, tract))][0]

In [None]:
%%time
catalogs = []
for patch in patches:
    print("Loading filter: {} tract: {}, patch: {}".format(_filter, tract, patch))
    catalog = butler.get('deepCoadd_forced_src', 
                         dataId={'tract': tract, 'patch': patch, 'filter': _filter}).asAstropy().to_pandas()
    # Get only a few columns of interest
    tmp = catalog[['coord_ra', 'coord_dec', 'modelfit_CModel_flux']]
    catalogs.append(tmp)


In [None]:
df = pd.concat(catalogs)
df = df.dropna()
df = df.loc[df['modelfit_CModel_flux'] > 0]
print('Total objects in this tract = {}'.format(df.size))

As mentioned above we need datashader to visualize a dataset of this size. See how nice it plays with Bokeh pan and zomm tools.

In [None]:
kdims = [('coord_ra', 'RA(deg)'), ('coord_dec', 'Dec(deg)')]
vdims = [('modelfit_CModel_flux', 'CModel Flux')]
ds = hv.Dataset(df, kdims, vdims)
ds

In [None]:
%%opts Points [tools=['box_select']]
image = datashade(hv.Points(ds))
image

In [None]:
frequencies, edges = np.histogram(ds.data['modelfit_CModel_flux'], 20)
hist = hv.Histogram((np.log(frequencies), edges))
hist