# Gene's Visualization Examples


## Preamble: Load Required Python Libraries

In [None]:
try:
    import os
    from datetime import datetime
    import numpy as np

    import pandas as pd
    import geopandas as gpd
    from shapely.geometry import Point
    import matplotlib.pyplot as plt

    import holoviews as hv
    import hvplot.pandas
    import hvplot.xarray
    import panel as pn
    from geoviews import tile_sources as gvts
except ImportError:
    print("A required library could not be found. ")
    raise

## Foreshadowing
You've seen some plots already as a part of data exploration.  We're going to look at how some of those might be made using `hvplot`.. but we're going to start at the beginning and work toward it.  Here's the flow: 
* Plotting methods you may already know (`matplotlib` or `seaborn`)
* How typical plots in `hvplot` differ
  * Structuring Data
  * Interactivity
* Designing x-y plots from scratch 
* Shortcuts with structured data
* Subplots vs shared axes
* Adding interactivity (examples)

## Reference Reading:
* https://panel.holoviz.org/
* https://hvplot.holoviz.org/
* https://matplotlib.org/stable/index.html

## Example 1: Toy data for simple plots

We're going to do a few simplified x-y scatter and line plots.  Just to make that simple, let's make 
a couple of variables to plot


In [None]:
## create a sequence of data points in x
n = 50*np.pi  # number of sample points in the domain
L = np.pi*10  # domain / input range over which to plot
dt = L/n      # space between points

t = np.arange(-L/2, L/2, dt) 
f_t = np.cos(t/8) * np.cos((2/3) * t) 


So now we have two arrays of numbers.


$$
f(t) = cos\Big(\frac{t}{8}\Big) \; cos\Big(\frac{2 t}{3}\Big)
$$

and

$$
t = \big[  -5\pi , 5\pi \big]
$$ 

Let's see the raw numbers: 


In [None]:
t[0:10] # just the first few elements

In [None]:
f_t[0:10]

So we now have two lists of numbers representing some data. 

We'll create a few simple scatter plots where $t$ is the horizontal axis, and $f(t)$ is the vertical axis.

### Example 1a: Plot Using `matplotlib`

In [None]:
# using matplotlib's pyplot: 
plt.plot(t, f_t)
plt.show()

The `matplotlib` plotting library has a long history of making charts and graphs in python. You may have used it before to make figures for publications and whatnot.  

Matplotlib was designed for making static figures (as PNGs), and they are adding interactivity as an afterthought. 

### Example 1b: Plot using `hvplot`
This plotting library borrows heavily from javascript plotting libraries
which are primarily designed for interactivity. They are an excellent match
for jupyter notebooks. 


In [None]:
## hv.extension('bokeh')
hv.Curve( 
        zip(t, f_t),
        label="f(t)"
    ).opts(height=300, width=600, color='red') 

Note that the data is passed to the plotting routine differently.  Rather than 
passing in two vectors of matching data values (what `matplotlib` requires), we
will pass in a list of 2-tuples with the points matched within the tuple. 

The python  built-in `zip` operator will merge `t` and `f(t)` together.  We need 
the data in this form so that `hvplot` will plot the points correctly. 

In [None]:
list( zip(t, f_t)  )[0:10] ## just the first several points


## Example 2: Combining Plots

It is much easier with `hvplot` than in `matplotlib` or `plotly`.... The plot elements have been imlemented with **operator overloading** so that infix operators like `+` and `*` have meaning specific to those plot objects. 

Let's do some light calculus here to get a few extra bits of data to plot. This is just a contrived example to get some data -- nothing special about this function or its derivative.... we're just having fun here. 


In [None]:
true_df = (np.cos(t/8) * -(np.sin(t/1.5)/1.5)) + (np.cos(t/1.5) * -np.sin(t/8)/8)


In [None]:
f_curve     = hv.Curve(zip(t, f_t),     label="f(t)"  ).opts(color='black')
f_dot_curve = hv.Curve(zip(t, true_df), label="f '(t)").opts(color='gray', line_dash=(4,4))


Note here that we did not display the curves, but rather assigned the plots to variables. We can display them individually: 


In [None]:
display(f_curve)


In [None]:
display(f_dot_curve)

...Or we can combine them... note the operators:

In [None]:
fig = f_curve + f_dot_curve

display(fig)

In [None]:
fig = (f_curve * f_dot_curve)
display(
    fig.opts(height=300, width=600)
)

### Example 2a: 3 or more plots

In [None]:
f_dot_finite_difference = hv.Curve(zip(t, np.diff(f_t)/dt), label="finite diff").opts(color='magenta', line_dash=(2,2))

fig = f_dot_finite_difference * f_curve * f_dot_curve 
fig.opts(
    hv.opts.Curve( height=300, width=600)
)

In [None]:
fig = f_curve + (f_dot_curve * f_dot_finite_difference)

fig.opts(
    hv.opts.Curve( height=300, width=500)
)

## Example 3: Structured Data

One of the most powerful features of `hvplot` is that it 'understands' data structures like 
pandas dataframes and xarray datasets.   For example, let's look at some streamflow gage data: 

In [None]:
import intake
cat = intake.open_catalog(r'https://raw.githubusercontent.com/hytest-org/hytest/main/dataset_catalog/hytest_intake_catalog.yml')
obs_data_src='nwis-streamflow-usgs-gages-cloud'
mod_data_src='nwm21-streamflow-usgs-gages-cloud'
variable_of_interest = 'streamflow'
try:
    obs = cat[obs_data_src].to_dask()
    mod = cat[mod_data_src].to_dask()
except KeyError:
    print("Something wrong with dataset names.")
    raise
start=datetime.strptime("01 01 18", '%d %m %y')
end=datetime.strptime("31 12 20", '%d %m %y')



We now have two datasets, one for `modeled` data: 

In [None]:
mod

And one for observed data

In [None]:
obs

This data is already structured, meaning there are natual associations between time values and 
the values within the `streamflow` variable.  `hvplot` knows how to deal with this, so
we **don't** have to construct a time vector and a data vector, pair them, and hand the 2-tuples
to the plotting routine.  We can just ask the xarray data structure to figure it all out and 
generate a figure on its own.  It will usually pick useful defaults: 

In [None]:
g = 'USGS-13317000'
obs['streamflow'].sel(gage_id=g).hvplot()

In [None]:
o = obs['streamflow'].sel(gage_id=g).where(obs.time.dt.year>=2018).hvplot(label="observed").opts(color='blue', line_width=0.5, xlim=(start, end)) 

m = mod['streamflow'].sel(gage_id=g).where(mod.time.dt.year>=2018).hvplot(label="modeled").opts(color='red', line_width=0.5, xlim=(start, end)) 


In [None]:
o * m 

## Example 4: Interactive Plot Generation
We've looked at 'standard' plots where the figure itself has interactive features.  

We can also move the interactivity 'up' a layer such that the plot is dynamically re-generated based on input from the user. 

This requires that we use another module: `panel`.  
Panel apps can contain `hvplot` figures.  And layout can take place either at the `hvplot` level or the `panel` level.  

Here's an example.... 

In [None]:
## This is the slider that lets us select dt
dt_select = pn.widgets.FloatSlider(
    name='dt', 
    start=0.2, end=2.0,
    step=0.1,
    value=1.0
)

## This 'decorator' declares that the `plot` function depends on the value of the slider.
@pn.depends(_dx=dt_select)
def plot(_dx):
    ## estimated
    x = np.arange(-L/2, L/2, _dx)
    f_prime = np.cos(x/8) * np.cos(x/1.5)
    estimated_df = np.convolve(
        np.pad(f_prime, 1, mode='edge'), 
        np.array( [1, 0, -1] ) / (2 * _dx), 
        mode='same')[1:-1]
    fig = \
        hv.Curve(zip(t, true_df), "t", "f '(t)", label="TRUE f '(t)").opts(color='gray', line_dash=(4, 4)) *\
        hv.Curve(zip(x, estimated_df), label="EST f '(t)").opts(color='magenta') * \
        hv.Text(-10, 0.6, f"dt = {_dx:.4f}")
    return fig.opts(
        hv.opts.Curve( height=300, width=600, )
    )

## This is panel's way of arranging elements.  These two simple widgets arranged in a column.
disp = pn.Column(
    dt_select, 
    plot
)

disp.servable('Effects of dx on Finite Difference Derivative')

## Interactivity and Maps
Using similar widgets, we can make a complicated mapping plot based on user input.  This is a good way to explore datasets. 


In [None]:
# Read some gages:
cobalt_df = pd.read_csv(
    'https://www.sciencebase.gov/catalog/file/get/6181ac65d34e9f2789e44897?f=__disk__22%2F1a%2Fd2%2F221ad2fe9d95de17731ad35d0fc6b417a4b53ee1',
    dtype={'site_no':str, 'huc_cd':str, 'reachcode':str, 'comid':str, 'gagesII_class':str, 'aggecoregion': str}, 
    index_col='site_no'
    )
cobalt_df.rename(columns={'dec_lat_va':'Lat', 'dec_long_va':'Lon'} , inplace=True)
# Re-format the gage_id/site_no string value.  ex:   "1000000"  ==> "USGS-1000000"
cobalt_df.rename(index=lambda x: f'USGS-{x}', inplace=True)

NWM = pd.read_csv(r'../data/NWM_v2.1_streamflow_example.csv', dtype={'site_no':str} ).set_index('site_no', drop=False)
# Merge benchmarks with cobalt data to form a single table, indexed by site_no
metrics = NWM.columns.tolist()[1:] #list of columns, EXCEPT the first column (site_no)
NWM = NWM.merge(
    cobalt_df, # Table to merge with NWM
    how='left',            # left join preserves only records which have an index in NWM dataframe.
    left_index=True, 
    right_index=True
    )


In [None]:
var_select = pn.widgets.Select(name='Metric', options=metrics, value='pearson')

base_map_select = pn.widgets.Select(name='Basemap:', 
                                    options=list(gvts.tile_sources.keys()), 
                                    value='OSM')


@pn.depends(var=var_select, base_map=base_map_select)
def plot(var, base_map):
    return NWM.hvplot.points(x='Lon', y='Lat', color=var, cmap='turbo_r', geo=True, tiles=base_map)

col = pn.Column(var_select, base_map_select, plot)
col.servable('Hydro Assessment Tool')

## Map as Selector / Combine with Time Series

In [None]:
# Geo-Enable our cobalt data:
import geoviews as gv
cobalt = gpd.GeoDataFrame(
    cobalt_df, 
    geometry=gpd.points_from_xy(cobalt_df.Lon, cobalt_df.Lat), 
    crs="EPSG:4326"
)
gage_map = cobalt.hvplot.points(
    geo=True, 
    color='blue',
    marker='^',
    size=12, 
    hover_cols=['site_no'],
)
pmap = (gvts.EsriTerrain * gage_map)
pmap

In [None]:
pn.extension()
# create widgets to capture mouse  clicks (stream) 
clicky_ = hv.streams.Tap(source=gage_map, x=-114.5985, y=45.2986) 

@pn.depends(clicky_.param.x, clicky_.param.y)
def timeseries(x, y):
    _p = Point((x, y))
    nearest_index = cobalt.sindex.nearest(_p)[1][0]
    site_id = cobalt.index[nearest_index]
    o = obs['streamflow'].sel(gage_id=site_id).where(obs.time.dt.year>=2018).hvplot(label="observed").opts(color='blue', line_width=0.5, xlim=(start, end)) 
    m = mod['streamflow'].sel(gage_id=site_id).where(obs.time.dt.year>=2018).hvplot(label="modeled").opts(color='red', line_width=0.5, xlim=(start, end)) 
    return (o*m).opts(width=800, height=400)

@pn.depends(clicky_.param.x, clicky_.param.y)
def location(x,y):
    return pn.pane.Str(f'click at >>> {x:.4f}//{y:.4f}')

viewer = pn.Column(location, pmap, timeseries)
viewer