#### Use Holoviews to interactively visualize HS2 data from STOQS Parquet request

This Notebook is part of the auv-python project (private repository at https://github.com/mbari-org/auv-python). It demonstrates how to read and make interactive plots of millions of data points accessed from a STOQS database.

To execute it (for example):

```bash
    cd GitHub  # Or other appropriate directory on your computer
    git clone https://github.com/mbari-org/auv-python.git
    cd auv-python
    poetry install
    poetry shell
    cd notebooks
    jupyter-lab
    # Open this notebook and run it from your browser - interactivity does not work in VS Code
```

The urls in the pooch.retrieve() calls  below were generated by going to https://stoqs.shore.mbari.org/stoqs_all_dorado/ and clicking the buttons of the Measured Parameters to be included in the Parquet file. Then clicking the "Measured Parameter Data Access" section and clicking the "Estimate requirements" button to verify that the estimated values are within the available values of the server. 

Executing the last cell results in an interactive plot with Depth and Date range selection widgets, e.g.:

![Select interesting time period within the data](hvplot_5.9-mpm_screenshot1.png "5.9-mpm screenshot")

In [1]:
import holoviews as hv
import hvplot.pandas
import ipywidgets as widgets
import numpy as np
import pandas as pd
import panel as pn
import pooch
from holoviews.operation.datashader import datashade

NOTICE: When using the below code the known_hash should not be changing. However, if an error message is recieved then change the known_hash value to what it received or "got" instead of what is expected.

In [2]:
%%time
# Can take several minutes to retrieve the data the first time, thereafter it's read from a local cache
data_file = pooch.retrieve(
    url="https://stoqs.shore.mbari.org/stoqs_all_dorado/api/measuredparameter.parquet?parameter__name=biolume_bg_biolume%20%28photons/liter%29&parameter__name=biolume_proxy_adinos&parameter__name=biolume_proxy_diatoms&parameter__name=biolume_proxy_hdinos&parameter__name=ctd1_temperature%20%28degree_Celsius%29&parameter__name=ctd2_temperature%20%28degree_Celsius%29&parameter__name=hs2_bb420%20%28m-1%29&parameter__name=hs2_bb700%20%28m-1%29&parameter__name=hs2_fl700&parameter__name=profile_number&measurement__instantpoint__timevalue__gt=2016-01-01+00%3A00%3A00&measurement__instantpoint__timevalue__lt=2020-12-31+23%3A59%3A59&measurement__depth__gte=-100.0&measurement__depth__lte=10000.0&collect=name&include=activity__name",
    known_hash="2765fd51e659b8335768aa3ba6d22830a88e734c4acc04b4c9054f6aab871f5c",
)
df = pd.read_parquet(data_file)
df.describe()

CPU times: user 6.27 s, sys: 947 ms, total: 7.22 s
Wall time: 6.77 s


name,biolume_bg_biolume (photons/liter),biolume_proxy_adinos,biolume_proxy_diatoms,biolume_proxy_hdinos,ctd1_temperature (degree_Celsius),ctd2_temperature (degree_Celsius),hs2_bb420 (m-1),hs2_bb700 (m-1),hs2_fl700,profile_number
count,4376859.0,1554146.0,1554146.0,1554146.0,4377483.0,4377255.0,3775769.0,3775680.0,3775769.0,4377511.0
mean,13952150000.0,0.06355984,0.06120775,0.05842985,11.80762,11.78784,0.004243722,0.006497401,0.0007222882,162.592
std,66631420000.0,0.1159976,0.1636134,0.405085,1.736753,1.729471,0.004331563,1.496655,0.001321515,111.7404
min,0.0,0.0,0.0,0.0,7.337319,7.33716,-0.001299565,-0.00523946,-0.09492986,1.0
25%,0.0,8.522036e-05,0.00106958,0.0,10.44547,10.41426,0.002001787,0.00166798,8.242336e-05,72.0
50%,385198000.0,0.01694408,0.01514859,0.0,11.77027,11.75009,0.003115333,0.003124387,0.0002382901,145.0
75%,7175917000.0,0.08231843,0.04571551,0.0,13.00337,12.98939,0.005224731,0.006240779,0.0008226301,239.0
max,5530390000000.0,1.880321,19.0255,28.63179,18.31997,18.31909,0.09945312,2662.765,0.09775415,448.0


In [3]:
# Remove egregious backscatter values
df["hs2_bb700 (m-1)"][df["hs2_bb700 (m-1)"] > 0.1] = np.nan

# Turn multi-indexes into regular columns in a dataframe modified
dfm = df.reset_index()
dfm.describe()

name,timevalue,depth,latitude,longitude,biolume_bg_biolume (photons/liter),biolume_proxy_adinos,biolume_proxy_diatoms,biolume_proxy_hdinos,ctd1_temperature (degree_Celsius),ctd2_temperature (degree_Celsius),hs2_bb420 (m-1),hs2_bb700 (m-1),hs2_fl700,profile_number
count,4377511,4377511.0,4377511.0,4377511.0,4376859.0,1554146.0,1554146.0,1554146.0,4377483.0,4377255.0,3775769.0,3773710.0,3775769.0,4377511.0
mean,2018-12-29 21:32:23.667593728,33.71976,36.78165,-121.9394,13952150000.0,0.06355984,0.06120775,0.05842985,11.80762,11.78784,0.004243722,0.004763386,0.0007222882,162.592
min,2016-03-30 17:03:47,-1.305286,36.60523,-122.3752,0.0,0.0,0.0,0.0,7.337319,7.33716,-0.001299565,-0.00523946,-0.09492986,1.0
25%,2017-10-03 03:55:35.500000,4.141015,36.73058,-121.9884,0.0,8.522036e-05,0.00106958,0.0,10.44547,10.41426,0.002001787,0.001667471,8.242336e-05,72.0
50%,2019-01-30 13:58:34,24.50916,36.78966,-121.9242,385198000.0,0.01694408,0.01514859,0.0,11.77027,11.75009,0.003115333,0.003122117,0.0002382901,145.0
75%,2020-08-06 06:23:02.500000,51.70796,36.82673,-121.8733,7175917000.0,0.08231843,0.04571551,0.0,13.00337,12.98939,0.005224731,0.006233687,0.0008226301,239.0
max,2020-12-04 15:27:13,251.9176,36.92717,-121.8085,5530390000000.0,1.880321,19.0255,28.63179,18.31997,18.31909,0.09945312,0.09993627,0.09775415,448.0
std,,35.79781,0.06467158,0.08721381,66631420000.0,0.1159976,0.1636134,0.405085,1.736753,1.729471,0.004331563,0.005399512,0.001321515,111.7404


In [None]:
# Initial Datashade scatter plot of backscatter and flouresence with no selection widgets
pts = hv.Points(dfm, ['hs2_bb700 (m-1)', 'hs2_fl700'])
datashade(pts).opts(hv.opts.RGB(width=800, height=500))

In [None]:
# Datashader timeseries plot of chlorophyll 
bb700_pts = hv.Points(dfm, ['timevalue', 'hs2_bb700 (m-1)'])
datashade(bb700_pts).opts(hv.opts.RGB(width=900, height=300))

In [None]:
# Function to return a generic range selection widget for any column in the dataframe
def sliderType(colmn, range_name):
    return pn.widgets.RangeSlider(
        start=dfm[colmn].min(),
        end=dfm[colmn].max(),
        value=(dfm[colmn].min(), dfm[colmn].max()),
        name=range_name,
        width=600,
        step=dfm[colmn].max() / 200,
    )

In [None]:
# Datashade scatter plot of backscatter and flouresence with Depth and Date selection widgets
dsticker = pn.widgets.Select(options=['dorado', 'dorado_Gulper'], name='platform')
def dsAUV_df(dsticker):
    df = dfm[dfm['platform'] == dsticker]
    return df

dsdf = hvplot.bind(dsAUV_df, dsticker).interactive()

# Dynamic hvplot will not work unless we use this dskind
dskind = pn.widgets.Select(name='kind', value='scatter', options=['scatter'], visible=False)
dsd = sliderType("depth", "Depth Range") 
dst = pn.widgets.DateRangeSlider(
    start=dfm.timevalue.min(),
    end=dfm.timevalue.max(),
    value=(dfm.timevalue.min(), dfm.timevalue.max()),
    name="Date Range",
)
dsplt = dsplt = dsdf[
    (dsdf.depth >= dsd.param.value_start)
    & (dsdf.depth <= dsd.param.value_end)
    & (dsdf.timevalue >= dst.param.value_start)
    & (dsdf.timevalue <= dst.param.value_end)
].hvplot(
    kind=dskind,
    x="hs2_bb700 (m-1)",
    y="hs2_fl700",
    grid=False,
    title=dsticker,
    datashade=True,
    xlim=(0, 0.06),
    ylim=(0, 0.02),
    dynamic=True,
    height=400,
)
dsplt